TEST OF NONVERBAL INTELLIGENCE 

SECOND EDITION 


A Language-Free Measure of Cognitive Ability 


Linda Brown 
Rita J. Sherbenou 
Susan K. Johnsen 


pro-ed 

8700 Shoal Creek Boulevard 
Austin, Texas 78758-6897 



Contents 


Acknowledgments / iv 

List of Tables and Figures / vi 

1 An Overview of the TONI-2 / 1 

The Purposes of Intelligence Tests / 1 
The Need for the TONI-2 / 2 
Description of the TONI-2 / 2 
Uses of the TONI-2 / 5 
Summary / 5 


2 Administering and Scoring the TONI-2 / 7 

Administration Procedures / 7 
Scoring Procedures / 13 

3 Interpreting the Results of the TONI-2 / 17 

Completing the Answer Booklet and Record Form / 17 

Interpreting Test Scores / 20 

Using the Standard Error of Measurement / 23 

Interpreting Deviant Scores / 23 

Interpreting Score Differences / 24 

Developing Local and Specialized Norms / 24 

Sharing the Results / 25 

Caveat Utilitor / 25 


4 Developing the TONI-2 / 29 

Item Analysis and Selection / 29 
Standardization and Normative Procedures / 32 
Test Reliability / 35 
Test Validity / 39 
Summary / 48 


References / 51 


Appendix / 55 



Acknowledgments 


We could not have built the TONI-2 without the 
assistance of individual classes and schools, entire 
school districts, state departments of public instruc¬ 
tion, child study centers, universities, community 
agencies, and businesses across the country. These 
organizations and the professionals who are associ¬ 
ated with them were generous in their contributions 
of time, data, and advice. We are indebted to them. 

Austin, Texas, Parks and Recreation Department 
Eastern Montana State University, Billings 
Flandreau, South Dakota, Indian Reservation 
Huntsville, Texas, Independent School District 
Jericho, New York, Junior High School 
Joliet, Illinois, Public Schools 
Lame Deer, Montana, Indian Reservation 
Leavenworth, Kansas, Area Council on Aging 
Quincy, Washington, Public Schools 
Smith College, Northampton, Massachusetts 
Stockton Scottish Rite, Stockton, California 
Texas School for the Deaf, Austin 
U.S.D. 453, Leavenworth, Kansas 
U.S.D. 207, Fort Leavenworth, Kansas 

(ALABAMA) Edith Hambrick; (ALASKA) Bev¬ 
erly Raphel; (ARIZONA) Theresa Renfro, Mary 
Trienzo; (CALIFORNIA) Sherry Brown, Shirley 
Chamberlain, Donna Charbeneau, Elise Lindvig, 
Kathy Newton, Sherry Rhodes; (CONNECTICUT) 
Mara Reilly; (FLORIDA) Warren Angstadt, Molly 


Bowell, Joan Ewing, Mary Garrett, Susan Melcher, 
Joyce Burick Swarzman; (GEORGIA) Karen Heath, 
Maurice Herrin; (HAWAII) Abe Tokioka; (IDAHO) 
Tim Hope, Carol Sutter; (ILLINOIS) Rebecca K. Fries, 
Bethany Graham, Martin Kinert, Ed Parpart, Tim 
Thomas;'(IOWA) S. Pike Hall; (KANSAS) Beth 
Brown, Thurma DeLoach, Tom Devlin, John Dickin¬ 
son, Janet Earl, Jimmy Evans, Virginia Fortner, 
Karen Jaquith, Leigh Johnson, Erryn Langdon, Gaye 
Neuberger, Ellen Peters, Miriam Phillips, Clyde Ran¬ 
som, Mike Slusher, Bob Strano, Tom Tracy, Brenda 
Traughber; (KENTUCKY) Nedra Boitnott, Mary 
Gregory, Judy Huttman; (MARYLAND) Henry Allen 
Groff, Jr.; (MASSACHUSETTS) Marion Nesbit; 
(MICHIGAN) Chuck Ducher, Matt Laboda, Leanne 
Miller, Jay Raycraft, Hollie Sabatos; (MONTANA) 
Susan Newell, Lois Sindelar; (NEBRASKA) Joan 
Eigenberg; (NEW YORK) Nannette Gershowitz, 
Mitchell Parker, Lois Smith; (NORTH CAROLINA) 
Raymond Privott; (OHIO) William Divis, Ila Johnson; 
(OREGON) Stacy Griffin, Adriana Martinez, Carl 
Serpa; (PENNSYLVANIA) Adrien Gauthier, Ken la 
Borde, Susan Mincavage; (SOUTH DAKOTA) Becky 
Leibel, Kathy Leibel, Linda Leibel, Vance Sneeve; 
(TEXAS) Barbara Ayres, Pam Barnes, Nancy Bog- 
danski, Jill Clyburn, Barbara Cocanougher, Dana 
Cochran, Gary Collins, Betty Ann Courtney, Terri 
Duley, Pamela Durr, Cory Duty, Judy Embrey, Pam 
Evans, Suzy Evans, E. C. Feazall, Vic Galloway, 
Earnest Grover, Phil Haas, Teresa Hand, Wylie 


IV 



Hardin, Kathy Hargrove, Patricia Hawes, Charlie 
Horton, Linda Houston, Joanne Ibsen, Herk Johnson, 
Glenda Kennedy, Bertie Kingore, Terry Masters, 
James Thomas McLean, Roy Mendez, Bonnie Patrick, 
James Peterson, Gerald Pollard, Michaela Ritter, 
Beverly Salas, Piret Sari, Felisha Stein, W. F. Stiles, 
Sherri Tubb, Anne Tucker, Gail Tucker, Susan Volz, 
Martha Wolfe; (UTAH) Louise Francis; (VIRGINIA) 
Joan Goodship, Joella Pierce; (WASHINGTON) Jim 
Culp, Steve Dal Porto, Ted Johnson, Jim Maxwell, 
Virgie Maxwell, Gene Rosenburger, Dave Rossing, 
Melanie Serpa, Loren Sherbenou, Mary Ellen Thrall; 
(WEST VIRGINIA) Larry Legg; and (WISCONSIN) 
Debbie Hyslop, Sr. Marie Petrie. 

Special thanks go to the members of the PRO-ED 
Research Department and Production Department for 


their good suggestions, professional assistance, and 
friendly support throughout the development and pro¬ 
duction of the TONI-2. 


Note 

Clinicians and researchers who use the TONI-2 
are encouraged to send copies of their work to the 
authors.in care of PRO-ED, 8700 Shoal Creek Boule¬ 
vard, Austin, Texas 78758-6897. We will be pleased 
to cite appropriate and well designed independent 
research in future revisions of the test and its manual. 


v 



List of Tables and Figures 


Table 3.1 
Table 3.2 
Table 3.3 
Table 4.1 
Table 4.2 
Table 4.3 
Table 4.4 
Table 4.5 
Table 4.6 
Table 4.7 
Table 4.8 

Table 4.9 
Table 4.10 

Table 4.11 
Table 4.12 

Table A: 
Table B: 


Guidelines for interpreting TONI-2 quotients 

Converting various standard scores into percentile ranks and TONI-2 quotients 

Standard error of measurement at various ages 

Median item-total correlations at various ages 

Median item difficulty percentages at various ages 

Demographic characteristics of the TONI-2 normative group 

Coefficients Alpha for the TONI-2 at various ages 

Immediate test-retest with alternate forms reliability coefficients at various ages 
Internal consistency reliability of the TONI-2 with special populations 
Correlation of the TONI-2 with various measures of achievement 

Correlation of the TONI-2 with various measures of aptitude, general intelligence, and 
developmental abilities 

Regression equation for WISC-R Full Scale IQ by QT and TONI-2 

Rotated factor pattern matrix based on the TONI-2 performance of normally achieving Indian 
subjects 

Mean TONI-2 scores for various groups 

Medians and ranges of item-total correlations for Forms A and B total scores with the alter¬ 
nate form’s items 

Converting raw scores to TONI-2 quotients and percentile ranks for Form A 
Converting raw scores to TONI-2 quotients and percentile ranks for Form B 


Figure 2.1 
Figure 2.2 
Figure 2.3 
Figure 3.1 
Figure 3.2 


Response choice numbers 

Responses marked on the Answer Booklet and Record Form 
Correct application of basals and ceilings 

Front page of the Answer Booklet and Record Form completed for A1 
Last page of the Answer Booklet and Record Form completed for A1 


vi 



1 

An Overview of the TONI—2 


Intelligence tests are widely used in schools, in 
industry, and in other sectors of our communities. 
Unfortunately, few suitable tests have been developed 
for use with populations who require language free, 
motor reduced, or culture reduced testing formats. 
The Test of Nonverbal Intelligence, Second Edition 
(TONI-2) was built to fill this void. It is a highly stan¬ 
dardized, psychometrically sound, norm referenced 
intelligence test with an administratio and response 
format that eliminates language and reduces motoric 
and cultural factors. This first chapter of the TONI-2 
manual briefly summarizes the purposes for which 
intelligence tests are used, establishes the need for 
the TONI-2, and concludes with a description of the 
test. 


The Purposes of Intelligence Tests 

Binet and Simon (1905) developed the first prac¬ 
tical intelligence test for the purpose of identifying 
children who would be likely to profit from a public 
education in the schools of Paris. In the more than 
80 years since Binet and Simon gave form and sub¬ 
stance to the concept of intelligence and aptitude 
testing, we have become increasingly fond of this 
endeavor and increasingly sophisticated in our instru¬ 
mentation. Statisticians have labored to refine the 


techniques used in the study of intelligence, thereby 
improving our theories of intelligence and intelligent 
behavior (e.g., Galton, 1869; Guilford, 1967; Spear¬ 
man, 1914; Terman, 1906; Thurstone, 1938). Today, 
psychometrists have a variety of intricate and rela¬ 
tively precise instruments from which to choose. Yet 
the basic purpose for which intelligence tests are used 
remains essentially unchanged: We use intelligence 
tests to estimate an individual’s aptitude and poten¬ 
tial for achievement. 

The use of intelligence tests has become common 
practice in the schools. Group tests are routinely 
administered to all school children in order to refine 
demographic information and also to screen students 
whose probability for success or failure is particularly 
high. Scores from individual intelligence tests are one 
of several criteria used to determine eligibility for 
special programs, particularly those for students sus¬ 
pected of being mentally retarded, learning disabled, 
or gifted and talented. Intelligence tests also are 
included in entrance criteria for many college and 
vocational training programs. 

Because they are predictors of occupational attain¬ 
ment as well as academic success, intelligence tests 
are often used in industry and by public and private 
employment commissions to gauge the potential of job 
applicants and employees being considered for pro¬ 
motions. In the community, therapeutic agencies such 
as hospitals, clinics, and rehabilitation centers find 
that the planning and implementation of treatment 


1 



programs may be aided by an evaluation of the intel¬ 
lectual functioning of clients and applicants. 

A controversial use of intelligence tests in school 
programs is to place students who display particular 
patterns of cognitive strengths and weaknesses into 
specific curricular programs or instructional formats. 
This practice is based on the aptitude-treatment inter¬ 
action research of Cronbach and Snow (1977) and 
assumes that test scores, especially intelligence test 
scores, yield meaningful information about learning 
styles and modality preferences that have direct appli¬ 
cations to classroom instruction and other interven¬ 
tions. Unfortunately, there is no empirical evidence 
to support the validity of such approaches, and the 
literature is replete with disputing data and theoret¬ 
ical cautions (e.g., Arter & Jenkins, 1979; Bloom & 
Raskin, 1980; Bloom, Wagner & Bergman, 1980; 
Kaufman, 1979; Lloyd, 1984; Myers & Hammill, 
1990; and Sattler, 1988). 


The Need for the TONI-2 

Despite the widespread use of intelligence tests 
and the availability of many empirically and theoret¬ 
ically sound instruments, there is a pressing need for 
an intelligence test that is not heavily loaded with 
linguistic, motoric, and cultural factors. 

Language and motor skills are prominent compo¬ 
nents of most of the intelligence measures currently 
in use. Yet, there are many subjects for whom such 
measures simply are not appropriate. People who are 
unable to read or write, people who have poor or 
impaired linguistic skills, people who are from dif¬ 
ferent linguistic backgrounds, and people with limited 
motor ability require a testing format that is entirely 
free of listening, speaking, reading, writing, and sub¬ 
stantial motor responses. A large portion of the peo¬ 
ple referred for evaluation by schools, clinics, and 
hospitals fall into one of these categories. They 
include aphasics, nonEnglish speakers, individuals 
who are learning disabled, individuals who are deaf, 
and individuals who have suffered severe neurological 
trauma through head injury, stroke, cerebral palsy, 
and similar conditions. 

There also are very few well built tests that do 
not have heavy cultural loadings. Certainly, no test 
can be “culture free.” Testing itself is a highly 
culturally loaded undertaking. Still, in a country such 
as the United States where individuals from diverse 
cultures are encountered in the schools and the work 
force in increasing numbers, there is a need for intel¬ 
lectual measures that are “less dependent on expo¬ 
sure to specific language symbols . . . [and thereby 
have] minimized bias against subjects from racial, 


ethnic, and socioeconomic as well as linguistic 
minorities” (Sattler, 1988, p. 383). 


Description of the TONI—2 

The TONI-2 is a language-free measure of 
abstract/figural problem solving. It is intended to be 
used with subjects ranging in age from 5-0 through 
85-11.years. There are two equivalent forms of the 
TONI-2, Form A and Form B, each containing 55 
items arranged in order of difficulty. 

Jensen (1980) discussed guidelines or dimensions 
that might characterize a language-free, culturally 
reduced test. First, performance measures should be 
used rather than paper-and-pencil tasks. Second, 
instructions should be pantomimed to the subject and 
not conveyed orally or in writing. Third, the test 
should include preliminary practice items. Fourth, 
the test should not be timed. Fifth, instead of contain¬ 
ing pictures or reading passages, the content of the 
test items should be abstract. Sixth, the test should 
require reasoning or problem solving rather than the 
recall of specific factual information. Finally, novel 
problems should be used in the test items to avoid 
contamination from the recall of previously learned 
information. In the following two sections, "Admin¬ 
istration and Response Format” and “Problem Solv¬ 
ing Tasks,” the reader will see that the TONI-2 is 
faithful to all seven guidelines. 

Administration and Response Format 

The TONI-2 is not a timed test. It requires 
approximately 15 minutes to administer. The format 
for administering the test is completely free of lan¬ 
guage. There is no listening, speaking, reading, or 
writing required in giving the test or responding to 
the test items. In addition, the subject is required to 
make only a minimal motoric response to the TONI-2 
items. 

The test items are contained in a picture book, 
printed one item per page. Each form of the TONI-2 
-contains six training items that are administered 
prior to the 55 actual items. The examiner panto¬ 
mimes instructions. The subject then indicates her or 
his choice by pointing or making some other mean¬ 
ingful motor response. 

Problem Solving Tasks 

The basis of all the TONI-2 items is problem solv¬ 
ing and the content is abstract/figural. Although 


2 



many intelligence tests—the Detroit Tests of Learn¬ 
ing Aptitude, School Edition (Hammill, 1991), the 
Wecksler Adult Intelligence Scale—Revised (Wechsler, 
1981), and the Slosson Intelligence Test for Children 
and Adults—Revised (Slosson, 1985), to name a few— 
attempt to tap a variety of intelligent behaviors, it 
is not unusual for a test to assess a single compo¬ 
nent or aspect of intelligence. Probably the most pro¬ 
minent example of single construct tests is the picture 
vocabulary test—the Peabody Picture Vocabulary 
Test—Revised (Dunn & Dunn, 1981) and the Quick 
Test (Ammons & Ammons, 1962) come to mind imme¬ 
diately. 

We selected abstract/figural problem solving as 
the core of the TONI-2 for two reasons. First, problem 
solving appears to be a general component or con¬ 
struct of intelligent behavior rather than being a 
splinter skill or subcomponent (Sternberg, 1980, 
1984). “A major aspect of intelligence is ability to 
solve problems” (Resnick & Glaser, 1976, p. 205). In 
discussing Gardner’s (1983) theory of multiple intel¬ 
ligences, Walters and Gardner (1986) note, “Intelli¬ 
gence is an ability or set of abilities that permits an 
individual to solve problems or fashion products. . . . 
The problem-solving skill permits one to approach a 
situation, in which a goal is to be obtained, and to 
locate and pursue appropriate routes to that goal” (p. 
165). Problem solving then, is a pervasive activity 
that reflects the level of intellectual functioning of the 
problem solver. Second, on an entirely practical level, 
the problem solving process and the abstract/figural 
content both lend themselves readily to the nonver¬ 
bal, motor reduced testing format that we wanted to 
use in the TONI-2. Finally, the abstract/figural con¬ 
tent ensures that test items are free of language and 
cultural indicators. The nature of the content also 
ensures that the material contained in each item is 
novel to all of the subjects who may take the test. 

The TONI-2 items require test subjects to solve 
problems by identifying relationships among abstract 
figures and then solving problems created by the 
manipulation of these relationships. Each item pre¬ 
sents a stimulus pattern in which one or more of the 
figures comprising the pattern is missing. The sub¬ 
ject completes the pattern by selecting the correct 
response from among either four or six alternatives. 

The figures in TONI-2 items contain one or more 
of the following characteristics: shape, position, direc¬ 
tion, rotation, contiguity, shading, size, length, move¬ 
ment, and figured pattern. For the most part, the 
more difficult items contain several of these charac¬ 
teristics while the easier items contain only one or 
two. 

Item difficulty also is increased by manipulating 
the type and number of problem solving rules that 
must be applied in order to arrive at a solution. Sub¬ 


jects must examine each item, focusing on differences 
and similarities among the figures in the stimulus 
pattern and also among the figures in the various 
response alternatives. They must identify the rule or 
rules that are operating among the figures and, fol¬ 
lowing the appropriate rules, select the correct 
response alternative. For the more difficult items, 
there may be several possible solutions that would 
successfully complete the stimulus pattern, but only 
one correct solution will be included among the 
response alternatives. 

One or more of the following rules are used in each 
TONI-2 item. 

1. Simple Matching. All figures share the same 
number of critical attributes. No differences 
exist among the figures in the stimulus. 



□ 

□ 


□ 


IN 

N! 

O 

1 1 

□ 

o 


2. Analogies. The relationship among the figures 
in one of the rows or columns is the same as 
the relationship among the figures in the other 
rows and columns. The relationship varies in 
the following ways: 

A. Matching. No differences exist among 
figures. 



(g> 

<$> 




<§) 

<§b 

[dP 


qp 



B. Addition. Figures change by adding new 
attributes or additional figures. 


O 

oo 

□' 



oo 

□ 

o 

□□ 

ooo 

□□□ 


3 






C. Subtraction. Figures change by 

subtracting one or more attributes. 



D. Alteration. One or more of the attributes 
or figures is changed or altered. 



E. Progressions. The same change continues 
between or among figures. 


3. Classification. The figure in the stimulus is a 
member of one of the sets of figures in the 
response alternatives. 



4. Intersections. A new figure is formed by 
joining parts of figures in the rows and 
columns. 



5. Progressions. The same change continues 
between or among figures. 





































Uses of the TONI-2 

The TONI-2 is a highly specialized, quick score 
test with two equivalent forms. Its content has been 
restricted to a very narrow range so that an adminis¬ 
tration format could be devised that would impose as 
few linguistic and motoric restraints as possible on 
the subjects taking the test. These were very delib¬ 
erate decisions made in the early stages of test devel¬ 
opment and they produced some distinct advantages 
and disadvantages for the TONI-2 user. 

The language-free, motor-reduced administration 
and response format makes the TONI-2 ideal for cer¬ 
tain hard-to-test subjects. Among these are subjects 
who have acquired or developmental aphasia or other 
severe spoken language disorders; subjects who are 
deaf or hearing impaired; subjects who do not speak 
English proficiently or who cannot read and write 
English; and subjects who have language and/or 
motor impairments stemming from such conditions 
as cerebral palsy, stroke, or head trauma. When sub¬ 
jects such as these are administered traditional paper- 
and-pencil tests or tests requiring the understanding 
and use of spoken language, the examiner cannot be 
certain if low test scores are the result of impaired 
intellectual functioning or if they could be attributed 
instead to impaired sensory, language, or motor func¬ 
tioning. The TONI-2 is uniquely well-suited to these 
specialized appraisal situations. Because the content 
is also culturally reduced, the TONI-2 does not handi¬ 
cap subjects who may not be familiar with main¬ 
stream Western culture, especially the culture of the 
United States. 

Because the TONI-2 has two equivalent forms, 
it is especially useful in situations where pre- and 
post-measures are desirable. The TONI-2 would be 
a suitable criterion measure in classic pre-post 
research designs where pupil progress or program 
effectiveness are being evaluated. It is similarly well 
suited to the periodic reevaluation of individuals in 
schools, clinics, and similar settings where the test 
results would be seriously contaminated if the same 
test were administered on multiple occasions. The 
' brevity of the TONI-2 is a distinct advantage in situa¬ 
tions where the subsequent use of the information 
does not warrant the investment of extensive testing 
time. 


Conversely, some of the TONI-2’s advantages 
become disadvantages in certain situations. A case 
in point is the limited scope of the test’s content. We 
deliberately restricted the TONI-2 items to abstract/ 
figural problem solving. Therefore, the test results do 
not reflect a subject’s proficiency in any of the other 
types of intelligent behavior. The TONI-2 is not 
intended to replace broad-based tests of intelligence 
or aptitude. Rather, it is intended to be used when 
one cannot trust the results of such broad-based tests 
because of the restraints imposed on some subjects 
by language and motor-laden administration and 
response requirements. 

With these variables in mind, the TONI-2 can be 
used with confidence: (a) to estimate aptitude and 
intellectual functioning; (b) to identify subjects who 
are believed to have intellectual impairments, espe¬ 
cially subjects whose test performance may be 
confounded by concurrent language and motor 
impairments; (c) to verify referrals; (d) to formulate 
hypotheses that may guide intervention or further 
evaluation; and (e) in research efforts. 


Summary 

The TONI-2 is a language-free, motor reduced, 
and culture reduced measure of intellectual function¬ 
ing in subjects age 5-0 through 85-11 years. There are 
two equivalent forms of the test containing 55 items 
each. All of the items require abstract/figural problem 
solving. The examiner pantomimes the instructions. 
The subject looks at the stimulus items and responds 
by means of pointing or other meaningful motor 
activity. The test is norm referenced and yields a 
deviation quotient score. The TONI-2 is particularly 
valuable in the intellectual appraisal of subjects 
whose test performance may be confounded by lan¬ 
guage and motor impairments arising from such con¬ 
ditions as aphasia, hearing impairments, lack of 
proficiency with spoken or written English, cerebral 
palsy, stroke, head trauma, and lack of familiarity 
with the culture of the United States. 


5 




Administering and Scoring the TONI—2 


This chapter contains specific instructions for 
administering and scoring the TONI-2. A standard 
set of administration guidelines and objective scor¬ 
ing criteria are two of the qualities that contribute 
to the standardization of an instrument. Such instruc- 
tions-.also reduce some of the major external sources 
of error variance in the test’s scores. Therefore, exam¬ 
iners are encouraged to read this chapter carefully 
and to follow the instructions precisely. 


Administration Procedures 

This section outlines the procedures that exam¬ 
iners should follow when administering the TONI-2. 
Both general and specific administration procedures 
are designated, and instructions are given for group 
administration as well. This section also specifies the 
qualifications required of examiners who use the 
TONI-2, the type of subjects to whom the test may 
be given, and the amount of time typically needed for 
testing. 


General Administration Procedures 

The TONI-2 is not a difficult test to administer, 
although its language free format is unique and may 


be unfamiliar to many examiners. So, read tire test 
manual carefully and become familiar with the test 
items before administering the TONI-2. 

It is important for an examiner to understand the 
theory upon which the test is based, its technical 
characteristics, and its proper uses. Because it also 
is important to master correct administration and 
scoring procedures, we recommend that novice exam¬ 
iners undertake at least three practice administra¬ 
tions before using the TONI-2 in an actual appraisal. 
Any questions about correct usage or procedure 
should be resolved by re-reading the appropriate sec¬ 
tion of this manual or by consulting the Research 
Department at PRO-ED, Inc. 

The TONI-2 has two equivalent forms, Form A 
and Form B. Examiners may use either form. We 
recommend the use of Form A when the test is admin¬ 
istered for the first time and the use of Form B on 
subsequent occasions. Having two forms means that 
the TONI-2 will be especially useful in situations 
were prc- and post-measures are needed, such as the 
evaluation of student progress or program effec¬ 
tiveness, periodic reevaluations, and other situations 
where significant contamination may result from 
the administration of the same form on multiple 
occasions. 

If there is any reason to suspect that the results 
of the TONI-2 may not accurately reflect a subject’s 
ability, then we recommend that the examiner admin¬ 
ister the alternate form a few days later to confirm 


7 



the findings. A second administration two or three 
days later will verify the consistency of the scores. 

Even though the TONI-2 is a nonverbal measure, 
feel free to talk to subjects before the testing session 
starts, establishing rapport and explaining the pur¬ 
pose of the test. Avoid the use of emotionally laden 
words such as “intelligence test” or “IQ” during these 
conversations. 

Once testing has begun, it is acceptable to praise 
or encourage subjects, using facial expressions or 
gestures rather than verbal encouragement. How¬ 
ever, any comments that reflect on the accuracy or 
value of a subject's responses should be avoided. 

Administer the TONI-2 in a comfortable location 
that is well lighted and well ventilated. The test set¬ 
ting should be quiet and as free from distractions or 
interruptions as possible. The examiner should strive 
to create an environment that conveys a feeling of 
confidentiality and privacy to the person taking the 
TONI-2. 

The TONI-2 usually will be administered individ¬ 
ually. However, group administration is acceptable 
and instructions are given later in this section for 
using the group format. 

Specific Administration Procedures 

To ensure that the TONI-2 is given in a standard 
manner that closely approximates the conditions 
under which the test’s norms were obtained, exam¬ 
iners should follow the specific administration pro¬ 
cedures outlined below. 

1. Determine which form of the TONI-2 will be 
administered. Usually, Form A will be adminis¬ 
tered on the first occasion that a subject is tested 
and Form B will be administered on subsequent 
occasions. 

2. Prepare the testing location. There should be a 
table or other flat surface with a chair for the 
examiner and a chair for the subject. Ideally, a 
tabletop easel will be available to hold the Pic¬ 
ture Book. 

3. Assemble the appropriate test materials. The 
examiner will need a copy of the TONI-2 Picture 
Book, a copy of the correct Answer Booklet and 
Record Form, and a pencil. Be certain that the 
Answer Booklet and Record Form corresponds to 
the form of the TONI-2 that is being adminis¬ 
tered. The Form A and Form B Answer Booklets 
are not interchangeable. 

4. Establish rapport with the individual taking the 
test. Explain the purpose for taking the TONI-2 


and describe the ways in which the results will 
or will not be used. 

5. Ask the subject to complete the identifying infor¬ 
mation requested on the front of the Answer 
Booklet and Record Form. Some people may need 
assistance in providing this information. In fact, 
examiners may find it more efficient to complete 
this information themselves before the test ses¬ 
sion begins. 

6. Place the Picture Book, preferably on an easel, 
in front of the subject so that both the subject and 
the examiner can see the items. The Picture Book 
is correctly oriented when the stimulus items 
appear at the top of the page and the response 
choices are at the bottom. 

7. Begin by administering the first training item, 
Item Tl. Gesture through the sequence of the 
stimulus and then point to the empty square in 
the stimulus pattern. Look questioningly at the 
subject. Point to the first response choice and then 
back to the empty square in the stimulus. Shake 
your head “yes” or “no” depending on the cor¬ 
rectness of the response. Do this for each response 
choice. Encourage subjects to join you in indi¬ 
cating correctness and allow them to complete the 
remaining five training items without prompting 
if they clearly understand the process. If a sub¬ 
ject is responding impulsively or does not seem 
to understand what is expected after all six train¬ 
ing items have been administered, then read¬ 
minister the training items one more time. If a 
subject still responds impulsively or still does not 
understand what is expected when the six train¬ 
ing items have been administered twice, then 
discontinue testing. The training items are 
not scored and the examiner does not record 
responses to them. 

8. If the subject completes the training items and 
appears to understand the task, then proceed to 
the actual test items. Turn to the items in the Pic¬ 
ture Book that correspond to the form of the 
TONI-2 being given. Form A begins on one side 
of the Picture Book and Form B items are printed 
on the reverse side. Begin testing with Item A1 
(or Bl, if Form B is being administered) if the sub 
ject is very young or very old, if the subject is 
suspected of having a significant intellectual 
impairment, or if the subject encountered some 
difficulty with the training items. For all other 
subjects, begin testing with the item indicated 
below. Arrows corresponding to these guidelines 
are printed in Section VII of the Answer Booklet 
and Record Form. 


8 



Figure 2.1. Response Choice Numbers 


9 

















6 


5-7 yrs. > 

— 

_ 1. 

1 

2 

3 

4 

© 

6 

29 

© 

2 

3 

4 


— 

_ 2. 

1 

2 

3 

4 

5 

© 

30. 

© 

2 

3 

4 



_ 3. 

1 

2 

© 

4 

5 

6 

31. 

© 

2 

3 

4 



_ 4. 

1 

© 

3 

4 

5 

6 

32. 

1 

2 

3 

© 

8-9 yrs. > 


_ 5. 

1 

2 

3 

© 

5 

6 

33. 

1 

2 

© 

4 



_ 6. 

1 

2 

© 

4 

5 

6 

34. 

1 

© 

3 

4 



_ 7. 

© 

2 

3 

4 

5 

6 

35. 

1 

2 

© 

4 



_ 8. 

1 

2 

© 

4 

5 

6 

36. 

1 

2 

3 

4 



_ 9. 

1 

2 

3 

© 

5 

6 

37. 

1 

2 

3 

4 

10-12 yrs. > 

— 

_ 10. 

1 

2 

3 

4 

© 

6 

38. 

1 

© 

3 

4 



_ 11. 

© 

2 

3 

4 

5 

6 

39. 

© 

2 

3 

4 


— 

_ 12. 

© 

2 

3 

4 

5 

6 

40. 

1 

© 

3 

4 



_ 13. 

1 

2 

© 

4 

5 

6 

41. 

1 

2 

3 

© 



_ 14. 

1 

2 

3 

4 

5 

© 

42. 

1 

2 

© 

4 

13-17 yrs. > 

1 

. 15. 

1 

2 

3 

$ 



43. 

1 

© 

3 

4 


0 

_ 16. 

© 

X 

3 


5 

6 

44. 

1 

2 

3 

4 


1 

. 17. 

1 

2 

3 

Si 

5 

6 

45. 

1 

© 

3 

4 


1 

. 18. 

1 

2 

3 

4 

Si 

6 

46. 

© 

2 

3 

4 


1 

. 19. 

1 

2 

3 

Si 



47. 

© 

2 

3 

4 

IB-20 yrs. > 

1 

. 20. 

1 

2 

$ 

4 

■ 5 

6 

48. 

1 

2 

3 

© 


1 

21. 

& 

2 

3 

4 

5 

6 

49. 

1 

© 

3 

4 

- 

0 

22. 

1 

2 

3 

© 

X 

6 

50. 

© 

2 

3 

4 

- 

1 

23. 

1 

2 

3 

$ 

5 

6 

51. 

© 

2 

3 

4 

- 

0 

24. 

® 

2 

3 

X 

5 

6 

52. 

1 

2 

© 

4 

21 + yrs. > _ 

1 

25. 

gs 

2 

3 

4 

5 

6 

53. 

1 

2 

3 

4 

- 

o 

26. 

1 

© 

3 

X. 



54. 

1 

2 

© 

4 

- 


27. 

1 

2 

3 

© 

5 

6 

55. 

© 

2 

3 

4 

- 


28. 

1 

2 

© 

4 









Figure 2.2. Responses Marked on the Answer Booklet and Record Form 


5 

5 6 

5 6 

5 6 

5 6 

5 6 

5 6 

© 6 

© 6 

5 6 

5 6 


5 6 
5 6 
5 © 

5 6 

5 6 
5 6 
5 6 
5 6 
5 6 
5 © 

5 6 
5 6 


10 



If the subject 
is age: 

5-7 years 
8-9 years 
10-12 years 
13-17 years 
18-20 years 
21 + years 


Begin testing 
with item: 

A1 or B1 
A5 or B5 
A10 or BIO 
A15 or B15 
A20 or B20 
A25 or B25 


13. Discontinue testing when both a ceiling (i.e., 
three incorrect responses in five consecutive 
items, or administration of Item A55/B55) and a 
basal (i.e., five consecutive correct responses, or 
administration of Item Al/Bl) have been estab¬ 
lished. 


Group Administration Procedures 


9. To administer the actual test items, point to the 
empty square in the stimulus pattern, make a 
sweeping gesture along the line of response 
choices, and point to the empty square again. 
Look questioningly at the subject. Allow the sub¬ 
ject to indicate her or his choice by pointing to 
it or by making some other meaningful nonver¬ 
bal response. Allow subjects all the time that they 
seem to need, but do not permit thefn to dawdle 
or to ponder test items too long. After 20 or 30 
seconds have passed, the probability of a sub¬ 
ject making a correct response decreases sub¬ 
stantially. 

10. Record the subject’s response in Section VII of the 
Answer Booklet and Record Form by placing an 
“X” over the number of the response selected by 
the subject. Figure 2.1 shows the numbers asso¬ 
ciated with each response choice. The examiner 
can tell by looking at the Answer Booklet which 
response is the correct one: The number of the cor¬ 
rect response is printed in boldface type inside a 
circle. Figure 2.2 shows an Answer Booklet that 
has been correctly marked. 

11. Items are arranged in order of difficulty, begin¬ 
ning with the easiest items and progressing to 
more difficult items. This makes it possible to 
apply basals and ceilings in order to reduce sub¬ 
stantially the time required to give the test. Con¬ 
tinue testing until Item A55 or B55 has been 
administered or until the subject has made three 
incorrect responses in five consecutive items. This 
is the ceiling. 

12. When the subject has reached Item A55 or B55, 
or has reached the ceiling (i.e., has missed three 
out of five consecutive items), discontinue testing 
and review the earlier responses to determine if 
a basal was achieved. A basal is defined as five 
consecutive correct responses. If the subject did 
not achieve a basal (i.e., did not give five right 
answers in a row) and if testing did not begin with 
Item A1 or Bl, then return to the entry point (i.e., 
the first item administered) and test downward 
until a basal of five consecutive correct responses 
is established or until Item A1 or Bl is admin¬ 
istered. 


Although the TONI-2 is intended to be adminis¬ 
tered individually, it is possible for a trained and 
experienced examiner to administer the test to small 
groups of no more than five subjects with no loss of 
accuracy or stability. Group administration should 
only be undertaken by experienced examiners who 
are familiar with the test, comfortable with the non¬ 
verbal administration format, and at ease with the 
scoring process, particularly the application of basals 
and ceilings during the actual test administration. 

The general administration procedures remain 
unchanged when the group option is exercised. 
Instructions are given below. 

1. Determine which form of the TONI-2 will be 
administered to each subject in the group. To 
ensure that subjects are not solving the same 
items simultaneously, the examiner should 
administer Form A to the first subject, Form B 
to the second, Form A to the third, and so on. 

2. Prepare the testing location. There should be a 
table or other flat surface with a chair for the 
examiner and one for each subject being tested. 
Ideally, a tabletop easel should be available to 
hold each subject’s Picture Book. 

3. Assemble the appropriate test materials. The 
examiner will need a copy of the TONI-2 Picture 
Book for each subject, a copy of the Answer Book¬ 
let and Record Form for each subject, and a 
pencil. Be certain that the Answer Booklets cor¬ 
respond to the form of the TONI-2 that is being 
administered. The Form A and Form B booklets 
are not interchangeable. 

4. Establish rapport with the individuals taking the 
test. Explain the purpose for taking the TONI-2 
and describe the ways in which the results will 
or will not be used. 

5. Ask the subjects to complete the identifying infor¬ 
mation requested on the front of the Answer 
Booklet and Record Form. Some people may need 
assistance in providing this information. In fact, 
examiners may find it more efficient to complete 
this information themselves before the test ses¬ 
sion begins. 


11 



6. Place a Picture Book, preferably on an easel, in 
front of each subject so that both the subject and 
the examiner can see the items. The Picture Book 
is correctly oriented when the stimulus items 
appear at the top of the page and the response 
choices are at the bottom. 


7. Begin by administering the first training item, 
Item Tl, to the first subject. Gesture through the 
sequence of the stimulus and then point to the 
empty square in the stimulus pattern. Look ques- 
tioningly at the subject. Point to the first response 
choice and then back to the empty square in the 
stimulus. Shake your head “yes” or “no” depend¬ 
ing on the correctness of the response. Do this for 
each response choice. Administer Item Tl to the 
remaining subjects in this same manner, then 
administer the remaining training items. 
Encourage subjects to join you in indicating cor¬ 
rectness and allow them to complete the remain¬ 
ing five training items without prompting if they 
clearly understand the process. If a subject is 
responding impulsively or does not seem to 
understand what is expected after all six train¬ 
ing items have been administered, then read¬ 
minister the training items one more time. If a 
subject still responds impulsively or still does not 
understand what is expected when the six train¬ 
ing items have been administered twice, then 
discontinue testing. The training items are 
not scored and the examiner does not record 
responses to them. 

8. After the training items have been successfully 
administered, turn to the items in the Picture 
Book that correspond to the form of the TONI-2 
being given. Form A begins on one side of the Pic¬ 
ture Book and Form B items are printed on the 
reverse side. Begin testing with Item A1 (or Bl, 
if Form B is being administered) if the subject is 
very young or very old, if the subject is suspected 
of having a significant intellectual impairment, 
or if the subject encountered some difficulty with 
the training items. For all other subjects, begin 
testing with the item indicated below. Arrows 
corresponding to these guidelines are printed in 
Section VII of the Answer Booklet and Record 
Form. Each subject in the group may begin test¬ 
ing at a different level. 

If the subject Begin testing 

is age: with item: 


5-7 years 
8-9 years 
10-12 years 
13-17 years 
18-20 years 
21 + years 


A1 or Bl 
A5 or B5 
A10 or BIO 
A15 or B15 
A20 or B20 
A25 or B25 


9. To administer the actual test items, begin with 
the first subject in the group. Point to the empty 
square in the stimulus pattern of the first item 
administered, make a sweeping gesture along the 
line of response choices, and point to the empty 
square again. Look questioningly at the subject. 
Allow each subject to indicate her or his choice 
by pointing to it or by making some other mean¬ 
ingful nonverbal response. Allow subjects to take 
as much time as they seem to need, but do not 
permit.them to dawdle or to ponder test items too 
long. After 20 or 30 seconds have passed, the 
probability of a subject making a correct response 
decreases substantially. Do this for each subject 
in the group, returning to administer the second 
item to the first subject. Continue in this pattern. 

10. Record each subject’s response on the inside of the 
Answer Booklet and Record Form by placing an 
“X” over the number of the response selected by 
the subject. Figure 2.1 shows the numbers asso¬ 
ciated with each response choice. The examiner 
can tell by looking at the Answer Booklet which 
response is the correct one: The number of the cor¬ 
rect response is printed in boldface type inside a 
circle. Figure 2.2 shows an Answer Booklet that 
has been correctly marked. 

11. Items are arranged in order of difficulty, begin¬ 
ning with the easiest items and progressing to 
more difficult items. This makes it possible to 
apply basals and ceilings in order to reduce sub¬ 
stantially the time required to give the test. Con¬ 
tinue testing until Item A55 or B55 has been 
administered or until the subject has made three 
incorrect responses in five consecutive items. This 
is the ceiling. 

12. When the subject has reached Item A55 or B55, 
or has reached the ceiling (i.e., has missed three 
out of five consecutive items), discontinue testing 
and review the earlier responses to determine if 
a basal was achieved. A basal is defined as five 
consecutive correct responses. If the subject did 
not achieve a basal (i.e., did not have five right 
answers in a row) and if testing did not begin with 
Item A1 or Bl, then return to the entry point (i.e., 
the first item administered) and test downward 
until a basal is established or Item Al/Bl is 
administered. 

13. Discontinue testing when both a ceiling (i.e., 
three incorrect responses in five consecutive 
items, or administration of Item A55/B55) and a 
basal (i.e., five consecutive correct responses, or 
administration of Item Al/Bl) have been estab¬ 
lished. Each subject in the group will likely com¬ 
plete the test at a different time. 


12 



Examiner Qualifications 

The TONI-2 may be administered by teachers, 
psychologists, psychological associates, educational 
diagnosticians, and other qualified professionals who 
read and follow the directions in this manual care¬ 
fully. Regardless of their professional positions, 
though, examiners should have had formal training 
in appraisal that imparted a working knowledge of 
psychometrics, an understanding of general testing 
procedures, insight into the uses and abuses of norm 
referenced tests and test scores, and a close acquain¬ 
tance with the specific assessment procedures 
employed in the area of intelligence or aptitude 
testing. 

A qualified examiner will know how specific tests 
and other assessment tools have been built, how they 
are intended to be used, and how they are scored and 
interpreted. Examiners also should be able to evalu¬ 
ate the quality of the tests that they use in order to 
ensure that only the most appropriate and best built 
instruments available will be administered. 

Supervised training and practice are highly rec¬ 
ommended. Usually this formal instruction and 
supervised practice are obtained through university 
coursework or through workshops sponsored by pro¬ 
fessional organizations, local or state education agen¬ 
cies, or private consultants. 

Examiners should adhere to the specific policies 
governing the administration and interpretation of 
intelligence tests that have been adopted by their 
particular school, agency, practice, or other adminis¬ 
trative unit. Usually, there are local and state regula¬ 
tions that govern such testing. 


Subject Qualifications 

The TONI-2 may be administered to subjects 
ranging in age from 5-0 through 85-11 years. Because 
its administration and response format is completely 
free of language and requires only a minimal motor 
’ response, the TONI-2 is particularly well suited to 
the needs of individuals who will be handicapped by 
the formats of traditional paper-and-pencil tests or 
tests requiring a significant amount of speaking, 
listening, reading, or writing. These include (a) sub¬ 
jects with known or suspected spoken language dis¬ 
orders, such as acquired or developmental aphasia; 
(b) people who are deaf or who have a significant hear¬ 
ing impairment; (c) individuals who do not speak 
English proficiently; (d) subjects who cannot read or 
write English; and (e) those with motor impairments 
caused by such conditions as cerebral palsy, stroke, 
or head injury. 


The TONI-2 is not intended to replace comprehen¬ 
sive assessment batteries, but to provide an alter¬ 
native for subjects who present special problems in 
the testing situation. The TONI—2 should be used 
when the examiner does not have confidence in the 
results of other types of broad-based intelligence tests 
because of potential contamination from language or 
motor impairments. 

Testing Time 

The TONI-2 is not a timed test. Subjects can take 
as long as they like to respond to each item, although 
they should not be permitted to dawdle. In general, 
it should take about 15 minutes to administer one 
form of the TONI-2. 


Scoring Procedures 

A subject’s score is calculated in Section VIII of 
the Answer Booklet and Record Form. The examiner 
begins by computing the total raw score and then con¬ 
verting the raw score to a quotient. These procedures 
are descr ibed next. 

Computing Raw Scores 

This section will instruct examiners how to com¬ 
pute the total raw score for the TONI-2 by determin¬ 
ing the number of correct and incorrect responses 
made by the subject and then applying basals and 
ceilings. 

Determining the Number of Correct and Incor¬ 
rect Responses. During the administration of 
the TONI-2, the examiner indicated the subject’s 
responses by marking an “X” over the response 
chosen. If the “X” is marked over a correct response 
(i.e., over a number printed in boldface type and con¬ 
tained in a circle), then the examiner should write a 
“1” in the blank preceding the item number. If the 
“X” is marked over an incorrect response (i.e., over 
a number that is not contained in a circle), then the 
examiner should write a “0” in the blank preceding 
the item number. Figure 2.3 shows how this has been 
done. 

Applying Basals and Ceilings. After the correct 
and incorrect responses have been noted, the exam¬ 
iner must identify the basal item. Basal is achieved 
when the subject makes five consecutive correct 
responses. The basal item is the last, or highest 


13 



numbered of those five items. Circle the basal item 
and write its number in the appropriate blank in the 
Score Summary area in Section VIII of the Answer 
Booklet and Record Form. If the subject never 
achieves a basal, then write a “0” in the blank where 
the number of the basal item should be entered. 

Sometimes the examiner will encounter a so- 
called false or multiple basal when scoring a test. This 
happens when two or more basals are reached dur¬ 
ing testing. The true basal is the one closest to the 
ceiling. For instance, if the subject achieves a first 
basal by responding correctly to Items 5, 6, 7, 8, and 
9, and then achieves a second basal by responding cor¬ 
rectly to Items 11,12,13,14, and 15, then Items 11-15 
constitute the true basal and Item 15 is the basal 
item. 

When the basal item has been identified, the 
examiner then must identify the ceiling item. A ceil¬ 
ing is achieved when the subject makes three errors 
within five consecutive items. The ceiling item is the 
last, or highest numbered of the three errors. Circle 
the ceiling item. 

Just as there are false basals, there also can be 
false ceilings. This happens when two or more ceil¬ 
ings are reached during testing. The true ceiling is 
the one closest to the basal. For instance, if there is 
a ceiling where Item 20 is incorrect, Item 21 is cor¬ 
rect, Item 22 is incorrect, Item 23 is correct, and Item 
24 is incorrect—followed by a second ceiling where 
Items 27, 28, 29, 30, and 31 are all incorrect—then 
Items 20-25 constitute the true ceiling and Item 24 
(the third error) is the ceiling item. 

Figure 2.3 contains several examples of the proper 
application of basals and ceilings, including some 
examples of false basals and ceilings. In all of the 
examples, testing began with Item 15. 

Example 1 presents a straightforward situation. 
Testing began with Item 15. The examiner continued 
testing until Item 26. At that point the subject had 
made three errors (Items 22, 24, and 26) within five 
consecutive items (Items 22-26). Item 26 is the ceil¬ 
ing item. Because the subject had already achieved 
a basal of five consecutive correct responses to Items 
17-21, there was no need to test downward. The basal 
item is Item 21. The subject made two correct 
responses between the basal and ceiling items. 

Example 2 shows the scoring procedure when no 
basal is achieved. Testing began at Item 15 and was 
discontinued after Item 20 when the subject had made 
three incorrect responses (Items 18-20) within five 
consecutive items (Items 16-20). Item 20 is the ceil¬ 
ing item. Because no basal had been achieved, the 
examiner began testing downward. The subject never 
made five consecutive correct responses, so there is 
no basal item. The examiner recorded a “0” in the 


blank for the number of the basal item. There were 
13 correct responses from the beginning of the test 
until the ceiling item. 

Example 3 shows the correct procedure to use 
when there is a double basal and no ceiling is reached. 
Testing began at Item 15 and continued until Item 
55 had been administered. A false basal occurred at 
Items 18-22, but the true basal (i.e., the one closer 
to the ceiling) occurred at Items 25-29. The basal 
item is Item 29. Between that point and the end of 
the test, the subject made 18 correct responses 
without ever making three errors in any five con¬ 
secutive responses. 

Example 4 shows the procedure that should be 
used when there is a double or false ceiling. The 
examiner began testing at Item 15 and continued 
until he thought the subject had reached a ceiling at 
Item 33. Because no basal had been achieved, the 
examiner tested downward until there were five con¬ 
secutive right answers at Items 13-17. Item 17 is the 
basal item. On review, the examiner discovered that 
a true ceiling (i.e., one closer to the basal) had 
occurred at Items 22-26 when the subject made errors 
on Items 22, 24, and 26. Item 26 is the ceiling item. 
There were 5 correct responses between the basal and 
ceiling items. 

Computing the Total Raw Score. The total raw 
score for the TONI-2 is simple to calculate. Use the 
Score Summary area in Section VIII of the Answer 
Booklet and Record Form. First, write the item num¬ 
ber of the basal item in the appi-opriate blank or write 
a “0” if no basal was achieved. Second, count the 
number of correct responses made between the basal 
item and the ceiling item; if ceiling was never 
reached, then count the number of correct responses 
made between the basal item and the end of the test. 
Finally, add these two numbers together to determine 
the total raw score. 

Refer again to the examples in Figure 2.3. In 
Example 1, the basal item is Item 21. The subject then 
made two additional correct responses before reaching 
ceiling at Item 26. Therefore, the total raw score in 
Example 1 is 23 points. In Example 2, there is no 
basal item and a “0” is recorded in the basal item 
blank. The subject made 13 correct responses from the 
beginning of the test until reaching a ceiling at Item 
20, resulting in a total raw score of 13 points. In 
Example 3, the basal item is Item 29. The subject then 
made 18 correct responses between the basal item and 
the end of the test without ever reaching ceiling. The 
total raw score is 47 points. In Example 4, Item 17 
is the basal item and Item 26 is the ceiling item. 
There were 5 correct responses between them, mak¬ 
ing the total raw score 22 points. 


14 



Example 1 Example 2 Example 3 Example 4 



Score Summary 


Basal Item 

2.1 

o 

-2*7 

17 

No. of Correct Responses 
Between Basal and Ceiling 

2 

(3 

16 

5 

Total Raw Score 

23 

i3 

47 

22L 


Figure 2.3. Correct Application of Basals and Ceilings 



15 





Converting Raw Scores to Deviation 
Quotients and Percentile Ranks 

Table A in the appendix of this manual is provided 
to convert raw scores for Form A of TONI-2 into quo¬ 
tients and percentile ranks. Table B is used for Form 
B of the test. Find the subject’s total raw score in the 
outside column of the appropriate table. Then move 
horizontally into the table, stopping at the appro¬ 


priate column for the subject’s chronological age. The 
corresponding quotient (Q) and percentile rank (%ile) 
are designated. These numbers should be entered in 
the appropriate blanks in Section II of the Answer 
Booklet and Record Form and the quotient may be 
plotted on the profile in that same area. Figure 3.1 
in the next chapter shows how this has been done for 
the sample subject, Al. 


16 



3 

Interpreting the Results of the TONI-2 


This chapter contains information that will help 
examiners to record and report the results of the 
TONI-2 and then to shed some light on the meaning 
of the test results. The chapter begins with instruc¬ 
tions for completing the TONI-2 Answer Booklet and 
Record Form. This is followed by guidelines for inter¬ 
preting test scores in general and deviant test scores 
in particular, including means for comparing stan¬ 
dard scores to each other. Procedures for developing 
local and specialized norms are summarized. The 
chapter continues with a discussion of the ways in 
which TONI-2 results can be shared with other pro¬ 
fessionals and with the test subjects themselves. A 
caveat concerning the use of test information closes 
this chapter. 


Completing the Answer Booklet 
and Record Form 

The TONI-2 Answer Booklet and Record Form is 
.where all of the information concerning the test 
administration is recorded. Space is provided on the 
front side of the booklet to record identifying infor¬ 
mation about the person to whom the test was admin¬ 
istered, to record and graph the results of the TONI-2 
along with the results of other tests that may have 
been administered, and to describe the administra¬ 


tion conditions. Figure 3.1 shows this page completed 
for a sample subject, Al. The inside pages are reserved 
for information concerning the actual administration 
of the test. Instructions are summarized and space is 
provided to record the subject’s responses to individ¬ 
ual items and to record anecdotal comments concern¬ 
ing the test administration. The previous chapter 
dealt with the use of these portions of the Answer 
Booklet and Record Form. The back page contains the 
score summary section where the total raw score is 
calculated and a section for the examiner to record 
comments and observations. Figure 3.2 shows this 
page completed for Al. Specific instructions for com¬ 
pleting the various sections of the TONI-2 Answer 
Booklet and Record Form are offered next. 

Section I. Identifying Information 

The information requested in this section identi¬ 
fies both the subject who was tested and the examiner 
or diagnostician who administered the TONI-2. In 
Figure 3.1 one can see how the subject’s name (Al B.) 
and parent or guardian (John and Jan B.); for school- 
aged subjects, the student’s school (Northside Middle 
School) and grade (6); the examiner’s name and title 
(M. Means, School Psychologist); and the test date 
(September 1989), the subject's date of birth (July 
1978), and the subject’s age at testing (11-2) are 
recorded. 


17 



Section II. Profile of Test Results 

This is the section where the examiner notes the 
subject’s TONI-2 quotient and percentile rank by 
writing them in the blanks at the bottom of the sec¬ 
tion. The standard error of measurement (SEm), found 
in Table 3.3, is recorded there, too. In addition, the 
quotient is depicted graphically on the profile. When 
the quotient has been plotted on the profile, the 
examiner can tell at a glance which scores are in the 
normal range of functioning (i.e., scores plotted in the 
unshaded portion of the profile) and which are either 
significantly higher or significantly lower than nor¬ 
mal (i.e., scores plotted in the upper and lower shaded 
portions of the profile, respectively). Al’s score is 
recorded and profiled in Figure 3.1. 


Section Ill. Other Test Data 

It is rare that a test will be given in isolation. 
Usually, a battery of appraisal tools will be adminis¬ 
tered. For that reason, Section III is provided for 
examiners to record the results of any other tests that 
may have been administered. These scores also are 
profiled in Section II. Three other tests were admin¬ 
istered to Al: the Detroit Tests of Learning Apti¬ 
tude, Second Edition (DTLA-2), the Quick Score 
Achievement Test (Q-SAT), and the Test of Lan¬ 
guage Development-Intermediate, Second Edition 
(TOLD-2I). The results of these tests are recorded in 
Section III and profiled in Section II. 


Section IV. Administration Conditions 

An examiner will always take into account the 
conditions under which a test was administered when 
interpreting the results of that test. The following 
information is recorded in Section IV: individual or 
group administration, site conditions which may have 
interfered with the administration of the test or con- 
' founded the subject’s performance on the test, and a 
list of variables related to the subject taking the test. 
The importance of this information is obvious. If the 
TONI-2 were given in a room where the air condi¬ 
tioning was working overtime or if the subject came 
to the testing session with a severe head cold, then 
it is possible that the test results could have been 
affected by these variables. Ms. Means, the school 
psychologist who administered the TONI-2 to Al, 
noted in Figure 3.1 that the testing site was satisfac¬ 
tory, but that Al came to the test session with a 
negative attitude and appeared to be fidgety and 
overactive. 


Section V. Instructions 

This section summarizes the procedures for 
administering the TONI-2 and for using basals and 
ceilings. Examiners should be familiar with this infor¬ 
mation before they initiate testing. Detailed instruc¬ 
tions are provided in Chapter 2 of this manual and 
examiners are urged to consult the manual directly 
if there are any questions about correct procedure. 

Section VI. Anecdotal Comments 

This section is provided for examiners to record 
any details observed during the administration of the 
test that may explain test performance. For instance, 
a subject may respond impulsively, pointing imme¬ 
diately to a response without ever looking at the stim¬ 
ulus items or the response choices. Or, the subject 
may squint at the test materials or attempt to rotate 
her or his body in order to view the materials dif¬ 
ferently. Information of this type should be recorded 
here. 


Section VII. Responses to the 
TONI-2 

Section VII is where the examiner records the sub¬ 
ject’s responses to each of the test items and makes 
note of the basal and ceiling items. The previous 
chapter dealt with the use of Section VII and Figure 
2.2 shows how the responses should be marked. 

Section VIII. Score Summary 

This section, printed on the last page of the 
Answer Booklet and Record Form, is where the exam¬ 
iner computes the total raw score for the TONI-2. 
Scoring instructions were detailed in the previous 
chapter. Figure 3.2 shows how this section has been 
completed for Al. 

Section IX. Interpretation and 
Recommendations 

In many ways Section IX is one of the most impor¬ 
tant parts of the Answer Booklet and Record Form. 
This is where the examiner synthesizes the test scores 
and other data that have been collected, interprets 
this information, and then makes recommendations 
for further evaluation, possible program placement, 
or some specific intervention plan. On Al’s record 


18 




Test of Nonverbal Intelligence 

Second Edition 

FORM A 

ANSWER BOOKLET 
AND RECORD FORM 


Subject’s Name AL. £> 


Subject's Parent/Guardian To mm a-mc> Ta.h b . 


Subject's School Noimtsipe. 

Examiner's Name H- WteAMe __ 

Examiner’s Title School Fs-fdaoiosuT 


Subject's Grade . 


Date of Testing 
Dale of Birth 
Age at Testing 


Year 


Month 

9 

7 

Z 


• Se'ctl6h |i.iProf|le of Test Results 


“ Conditions a#;-,*.' . • 


TONI-2 

QUOTIENT 


RESULTS OF 
OTHER MEASURES 


in 



t - 



« 



D 



(/ ) 



o 



o 

< 

CO 


E 

E 



o 

IX 

LL 

LL 

160 


• 

155 


■ 

150 


• 

H5 

■ 

a 

140 

m 


135 

• 


130 

• 

■ 

125 

• 


120 

• 

■ ' 

115 

• 

• 

110 

■ 

■ 

105 



100 

X 


95 



90 



05 



60 



75 



70 



65 



60 



55 



50 



45 



40 




X 


>c 


>c 


>c 


» 




3 . M 


TONI-2 Quotient 


Percentile Rank 


SEm 


Section 111. Other Test Data 

Test Name 

Date of Testing 

Equivalent Quotient 

i.. 

DTLA 

- z. NcUveaBAc e/e? 


|0Z- 

2. . 

PTLA 

- 2, vea&At- 



3. . 

asA,r 

l/SY 


7L 

4. 

To LI? 

- 2.1 -Z /S“? 


lo 

5. 






Who referred the subject? . 


Laura tTofjas ( ccc-sr) 


What was the reason for referral? 

INPCOXTBP BY SCRgeMINO-. 


FUR.TRER. EVALUATION 


Who discussed the referral with the subject (and with the sub¬ 
ject's parenl/guardian, if appropriate)? H. Means amp 


u .Trhes 


Please describe the administration conditions for the TONI-2 by 
completing or checking the appropriate categories below. 

Group (G) or Individual (I) Administration - 1 - 


Location Variables [Interfering (I) or Noninlerfering (N)] 
Noise level 

Interruptions, distractions 
Light, temperature 
Privacy 

Other_ 


U 


M 


Respondent Variables [Interfering (I) or Noninterfering (N)] 

Understanding of lest content _ 

Understanding of test formal _ 

Energy Level O veiP. ac-tiDE- 
Attitudo tov/ard testing WecjA.'t ,ve- 
Health 

Rapport with examiner 

Other FiPfrerY _ 


X 


M 


N 


X 


Additional copies of this form (#0903) are available from 
© Copyright 1990 PRO-ED, Inc. | PRO-ED, Inc., 8700 Shoal Creek Boulevard, Austin, Texas 78756 

Figure 3.1. Front Page of Answer Booklet and Record Form Completed for A1 


19 



















form in Figure 3.2, the psychologist has recommended 
additional evaluation in the area of spoken language. 
Al’s TONI-2 quotient and DTLA-2 Nonverbal com¬ 
posite quotient appear to be normal, but his Verbal 
quotient on the DTLA-2 and his TOLD-2 Intermedi¬ 
ate scores are quite low. His Q-SAT achievement 
scores are also below average. This pattern suggests 
that a language disorder may be contributing to 
subaverage achievement rather than an intellectual 
deficit. 


Interpreting Test Scores 

The purpose of this section is to familiarize read¬ 
ers with the different test scores associated with the 
TONI-2. In addition to raw scores, two kinds of nor¬ 
mative scores are reported: percentile ranks and devi¬ 
ation quotients. All three scores will be discussed 
here. 


Raw Scores 

Raw scores are the original numeric values asso¬ 
ciated with performance on the TONI-2. Raw scores 
are, quite simply, a tally of the responses to the test 
items. They have no inherent meaning. There is no 
way for an examiner to know if a particular raw score 
is considered to be normal or not, or if one raw score 
represents significantly better or worse performance 
than another raw score. The fundamental value of a 
raw score is that it can be converted into a variety 
of standard scores calculated directly from a cumu¬ 
lative frequency table of all the raw scores earned by 
the normative sample of the test. 

Percentile Ranks 

Percentile ranks are reported for both forms of the 
, TONI-2. Examiners may use Table A to convert raw 
scores to percentile ranks for Form A and Table B for 
Form B. 

Percentile ranks are useful and easily understood 
scores, and they are frequently encountered in educa¬ 
tional and psychological reports. The principle draw¬ 
back to percentile ranks is that they are interval data; 
the unequal distance between score points means that 
percentile ranks cannot be averaged or otherwise 
operated on arithmetically. 

Percentile ranks, like other transformed scores, 
are derived directly from the raw score distribution 
of a test. Percentile ranks indicate the percentage of 
scores in the normative group that are above or below 


the score in question. A percentile rank of 50 is the 
mean of the percentile distribution. It is the midpoint 
in the score distribution and represents exactly aver¬ 
age or statistically normal test performance. Larger 
percentile ranks indicate higher or better test per¬ 
formance; smaller percentile ranks indicate lower or 
worse performance. 

Because they are so straightforward, percentile 
ranks are very easy to understand and interpret. If 
A1 earns a percentile rank of 45 on the TONI-2, his 
score is slightly below average: It indicates that 55% 
of his age-mates in the test’s normative sample 
earned higher scores, while 45% scored at the same 
level or below Al. On the other hand, if Peg has a 
percentile rank of 80, her performance is well above 
average: It indicates that only 20% of her age-mates 
in the TONI-2 normative sample scored higher. 

Examiners should remember that percentile 
ranks are not interval data. Equivalent differences 
in percentile ranks cannot be interpreted as equiv¬ 
alent differences in the attribute being measured. 
Since percentile ranks are normally distributed, they 
cluster heavily around the mean, or 50th percentile, 
and are more sparsely distributed at either tail of the 
normal curve distribution. Therefore, the difference 
between percentile ranks of 50 and 55 represents a 
much smaller difference in actual performance than 
does the same 5-point difference between percentile 
ranks of 5 and 10 or between percentile ranks of 90 
and 95. 


Deviation Quotients 

Deviation quotients are a type of standard score. 
They are sometimes referred to as derived or trans¬ 
formed scores because they are derived directly from 
the raw score distribution and then are transformed 
into a normalized score distribution. Standard scores 
are expressed as standard deviation units to indicate 
a score’s distance from the average performance of 
the normative sample by assigning a predetermined 
mean and standard deviation to the transformed dis¬ 
tribution. The deviation quotients used in the 
TONI-2 have a mean of 100 and a standard devia¬ 
tion of 15. This distribution will be familiar to the 
users of most tests of intelligence and aptitude. The 
larger the quotient, the better the performance that 
it represents. Table 3.1 provides some useful guide¬ 
lines for interpreting quotients. 

Remembering the mean and standard deviation 
of the TONI-2 quotients, one knows that Al’s quotient 
of 85 on Form A is an indication that he is perform¬ 
ing one standard deviation below the rest of his age- 
mates in the normative sample. Peg’s quotient of 100 
is exactly average or normal for her age group. If she 


20 



1. Basal Item # 

2. Number of Correct Responses Between Basal Item and Ceiling Item # ^ 

3. Total Raw Score 




XRE w (TH IM THE- ,»,>yeR>,Oe OR NnRMU. R.AM6E OM THE TOh4 l - "Z AMD 


THE MoAW sr8ai- dlo^posnrE cF the. 'D’TL A- - . His scores Hooue.OE.R-, /^ca£- 

uow OtJ THE V&R 3 A.U ^omFoHT£ £>V= THE ^TA'2- AHD THE. TOLD'2.1 
U'S A^I+iR-rE^EHT SPORES O^ THE t-SAT A,R£ ALSO WEl-O gEUKO TUB 
WEtlAfrE etA,K)<S-\=. UAHd>uAO£. CMS e»R.toER. MAV B£ fioDTRrSoTiraO T° 

Beuoco AUERA6€ Scores OsS AoHiEJErtEiJT AMP UAtJiUAOE - BA^ct) 
A? T l"rJ ©e. The. Ton l - 2- QOO TiepT HdPiCATES THAT Aid InITiLoItCTUAL- 


DeFiom i s UrOUMtecV. 


It 15 MV RecoMMe^RATiorJ that FuRTrtcP- TESTirJO IS 

MEAEiSAP-t no the area of spovc.Eio lmoojac>E, 



~T77. 



Were the results of the TONI-2 interpreted to the: 

Subject? ’ If woe hw uihfim*? * 


If yes, by whom? 


subject's parent/guardian (if appropriate)? 


7?Z. TT^^-c^iclJ 

f If yes, by whom? . 


7?l. TTZ^u^ 


Were the results of the TONI-2 used in a staffing, IEP meeting, or other planning conference? If yes, please attach a copy of the results 
or recommendations of that meeting to this form. 


Figure 3.2. Last Page of the Answer Booklet and Record Form Completed for A1 


21 











TABLE 3.1 

Guidelines for Interpreting TONI-2 Quotients 


Quotient 

Rating 

% Included 

>130 

Very Superior 

2.34 

121-130 

Superior 

6.87 

111-120 

Above Average 

16.12 

90-110 

Average 

49.51 

80-89 

Below Average 

16.12 

70-79 

Poor 

6.87 

<70 

Very Poor 

2.34 


had earned a quotient of 111, her performance would 
have placed her somewhat above the average level. 

Quotients can be computed for both TONI-2 
forms. Table A is used to convert raw scores to quo¬ 
tients on Form A and Table B is used for Form B. 

Quotients and other standard scores are some¬ 
what more valuable than percentile ranks, even 
though they both are computed directly from the 
raw score distribution. Quotients are interval data 
and can be compared directly to other standard 
scores with the same mean and standard deviation. 
This gives examiners the flexibility to compare a 


subject’s performance to the performance of the 
normative group or to the subject’6 own perform¬ 
ance on another test, on an alternate form of the 
TONI-2, or to a test taken at another time. Quo¬ 
tients also can be added and subtracted and otherwise 
acted on statistically, which makes them ideal for 
research. 

In fact, a quotient can be compared to any other 
standard score, regardless of the distribution. Stan¬ 
dard scores are easily converted from one distribution 
to another. Table 3.2 is particularly helpful when 
comparing scores from several tests. Simply find the 
score in the column corresponding to the exist¬ 
ing standard score distribution and move horizon¬ 
tally to the column corresponding to the desired 
distribution. A deviation quotient of 90, for instance, 
corresponds to a z-score of — .7, T-score of 43.3, 
a stanine of 4, and a percentile rank of 25. These 
scores can be compared statistically or visually on a 
profile. 

To illustrate the use of this table, consider Peg’s 
performance on the TONI-2 and on another test 
whose results are reported as T-scores. The T-score 
distribution has a different mean and standard devia¬ 
tion than the quotient associated with the TONI-2. 
To compare Peg’s T-score of 47 to her TONI-2 quo¬ 
tient of 100, the examiner can use Table 3.2 to con- 


TABLE 3.2 

Converting Various Standard Scores into Percentile Ranks and TONI-2 Quotients 


Percentile 

Ranks 

TONI-2 

Quotients 

NCEs 

T-s cores 

z-scores 

Stanines 

99 

150 

99 

83 

+ 3.33 

9 

99 

145 

99 

80 

+ 3.00 

9 

99 

140 

99 

77 

+ 2.67 

9 

99 

135 

99 

73 

+ 2.33 

9 

98 

130 

92 

70 

+ 2.00 

9 

95 

125 

85 

67 

+ 1.67 

8 

91 

120 

78 

63 

+ 1.33 

8 

84 

115 

71 

60 

+ 1.00 

7 

75 

110 

64 

57 

+ 0.67 

6 

63 

105 

67 

63 

+ 0.33 

6 

50 

100 

50 

50 

0.00 

5 

37 

95 

43 

47 

-0.33 

4 

25 

90 

36 

43 

-0.67 

4 

16 

85 

29 

40 

-1.00 

3 

9 

80 

22 

37 

-1.33 

2 

5 

75 

15 

33 

-1.67 

2 

2 

70 

8 

30 

-2.00 

1 

1 

65 

1 

27 

-2.33 

1 

1 

60 

1 

23 

-2.67 

1 

1 

55 

1 

20 

-3.00 

1 


22 







vert the T -score to a quotient of 95.5, thereby learning 
that the two scores are similar in magnitude. 


Using the Standard Error 
of Measurement 

The standard error of measurement is an impor¬ 
tant statistic that helps examiners to control statis¬ 
tically for some of the error that creeps into the 
testing situation. Table 3.3 contains the standard 
error of measurement associated with both forms of 
the TONI-2 at various ages. In general, the standard 
error of measurement for both forms of the TONI-2 
across all ages is about 3 points. Here, we will discuss 
the uses of this statistic. 

Examiners know that a test score is only an esti¬ 
mate of the subject’s test performance. The upper and 
lower limits of the range in which the true score prob¬ 
ably lies can be calculated by adding the standard 
error of measurement to and subtracting it from an 
individual’s obtained test score. Probability estimates 
are applied. For instance, 68% of the time, a subject’s 
true test score is likely to fall within a range that is 
plus or minus one standard error of measurement 
from the obtained test score. By extending the range 
to plus or minus two standard errors of measurement, 
the examiner increases the confidence interval to 
95%. A three standard error of measurement range 
pushes the confidence interval up to 99%. 

When examiners report TONI-2 scores or inter¬ 
pret them to others, they should provide the standard 
error of measurement. In this way, subsequent users 
of the test score will know the range within which 
the true score probably lies. A place for recording that 
information is provided on the Answer Booklet and 
Record Forms for each of the two TONI-2 forms. 

To see how the standard error of measurement is 
used in practice, look at Figure 3.1 where Al’s 
TONI-2 quotient is recorded in Section II of the 
Answer Booklet and Record Form. The quotient is 98 
and the related standard error of measurement is 3. 
This means that with a 68% level of confidence, the 
examiner can be relatively sure that Al’s true score 
is between 95 and 101. When the level of confidence 
is increased to 95%, the true score range increases to 
92 to 104. And with a 99% confidence level, the exam¬ 
iner will know that Al’s true score is likely to be found 
between 89 and 107. 


Interpreting Deviant Scores 

The previous sections were concerned with the 
degree or amount of deviance observed in a particular 
score. Examiners also are concerned with the direc- 


TABLE 3.3 

Standard Error of Measurement at Various Ages 


Age 

In Years 

SEM for 
Form A 

SEM for 
Form B 

5 

4.5 

3.7 

6 

1.5 

3.4 

7 

3.4 

3.7 

8 

3.7 

4.2 

9 

4.0 

3.7 

10 

4.0 

3.0 

11 

3.4 

3.0 

12 

2.6 

2.6 

13 

3.4 

3.4 

14 

3.4 

2.6 

15 

3.0 

2.6 

16 

3.0 

2.6 

17 

3.0 

3.0 

18 

2.6 

2.6 

19 

3.0 

2.6 

20-29 

2.1 

2.1 

30-49 

2.1 

1.5 

50-85 

2.6 

2.1 


tion of the deviance; that is, is the score in question 
unusually high or unusually low? High scores and low 
scores both deviate significantly from the norm and 
both deserve diagnostic consideration. 

Significantly low“ scores are percentile ranks 
below 25 or quotients below 85. Scores of this low 
magnitude may be indicative of mental retardation, 
developmental disabilities, or other cognitive dis¬ 
orders. If they are confirmed by other diagnostic or 
clinical data, low scores indicate a need for some kind 
of intervention, such as therapy or special class 
placement. 

Significantly high scores are percentile ranks 
above 75 or quotients above 115. If confirmed, scores 
this high may be indicative of intellectual giftedness. 

While we can assume that a person who consist¬ 
ently performs well on the TONI-2 is not mentally 
retarded, we cannot assume that a person who scores 
poorly is mentally retarded. Examiners are encour¬ 
aged to consider a variety of explanations for low 
scores on the TONI-2. Such variables as general 
health, test anxiety, misunderstanding of test instruc¬ 
tions, lack of interest, and even the time of day when 
a test is administered can affect an individual’s per¬ 
formance. Alternative explanations of low scores 
should be sought and any diagnosis or placement deci¬ 
sion should be made only after further testing and an 
analysis of the person’s performance in school, on 
the job, within the community, or in daily living 
activities. 


23 




Comparisons between the TONI-2 and tests or 
activities loaded with spoken or written language 
tasks could be helpful in developing these alternate 
explanations and in estimating the potential of sub¬ 
jects who are nonverbal, illiterate, nonEnglish speak¬ 
ing, or otherwise hampered by poor language skills 
or limited cultural exposure. There is strong reason, 
for example, to suspect language deviance rather than 
intellectual deviance when a subject exhibits a pro¬ 
file characterized by near normal or above average 
TONI-2 performance combined with subaverage per¬ 
formance on language loaded measures. 

Space is provided in Sections II and III of the 
TONI-2 Answer Booklet and Record Form to record 
and profile scores obtained on other tests that may 
have been administered to the subject. Intelligence 
tests, language tests, measures of adaptive behavior, 
and general achievement batteries may provide inter¬ 
esting and helpful information when compared to the 
TONI-2 results. 


Interpreting Score Differences 

Test scores are rarely considered in isolation. 
They usually are weighed with regard to their rela¬ 
tionship to other diagnostic information and other test 
scores. For this reason, the comparison of test scores 
is an important part of interpretation and diagnosis. 

The TONI-2 scores may be compared to other 
scores to produce a diagnostic profile. Or, they may 
be compared to previous performance on the test in 
order to gauge the success of treatment, therapy, 
special placement, or the passage of time. They also 
may be compared to the scores of other tests of 
achievement or affect to sketch an even broader, more 
holistic pattern. Regardless of the type of score com¬ 
parison, one is led ultimately to two questions: Are 
particular test scores different, and, if so, is the dif¬ 
ference important? 

Examiners will have to decide if (a) the difference 
observed between two test scores is the result of ran¬ 
dom error or normal fluctuation, or if (b) it represents 
an actual difference between the two attributes be¬ 
ing measured. These are statistical questions that are 
easily answered. Anastasi (1988) provides her readers 
with a simple formula for computing the standard 
error of the difference (SED) between two scores at 
the 5% level of confidence. If the actual difference 
exceeds the SED, then the examiner knows that 95 
times out of 100 the difference is a true one and should 
be regarded as significant. 

Examiners can use the following formula to com¬ 
pute for themselves the SED between a TONI-2 quo¬ 
tient and a quotient of the same distribution from 


another test: SED = 1.96 %/ 9 + SEm 2 • In 

this formula 9 is the squared mean standard error of 
measurement associated with the TONI-2 and SEm 2 
is the squared standard error of measurement 
associated with the quotient to which the TONI-2 
score is being compared. 

Readers should note that the difference between 
two scores is not as reliable as the scores themselves. 
For this reason, examiners or researchers may want 
to compute the standard error of measurement for the 
difference score itself. This degree of precision usually 
is not essential for most educational interpretations. 

Statistical significance is not the sole criterion by 
which score differences should be judged. An observed 
difference may have no practical value at all, despite 
its statistical significance. This is particularly true 
when both scores are essentially normal. For instance, 
when a TONI-2 standard score is between 85 and 115, 
even a statistically significant difference probably 
merits only passing clinical interest. Typically, only 
scores that fall into the deviant range are scrutinized. 


Developing Local and 
Specialized Norms 

The TONI-2 percentile ranks and quotients are 
based on the performance of a large, nationally rep¬ 
resentative normative sample. This group of subjects 
is representative of the United States population as 
a whole, sharing such demographic characteristics as 
gender, race, ethnicity, and socioeconomic status. 
Despite this representativeness, there are some 
instances when school districts or other agencies may 
find it advantageous to develop norms based on the 
performance of individuals who are members of a very 
specific subgroup. Such specialized norms could be 
helpful for handicapped populations, treatment 
groups such as stroke or head injury patients, or indi¬ 
viduals planning to enter a particular profession or 
vocation. A school district may develop local norms 
for its own students, or a sheltered workshop may 
develop specialized norms based on the performance 
of young adults who are mentally retarded and who 
are employed there. Any interpretation of the 
TONI-2 using specialized norms would not be based 
on national performance, as the norms in this manual 
are, but on the performance of the specified subpopu¬ 
lation. For example, a test’s national norms might be 
applied to help establish the diagnosis of mental retar¬ 
dation for a sheltered workshop’s mentally retarded 
employees, but specialized norms would be more help¬ 
ful in estimating and comparing the potential of 
clients being considered for admission or employment. 


24 



The national norms permit comparison to the popula¬ 
tion of mentally retarded individuals as a whole; the 
specialized norms permit comparison to the popula¬ 
tion of workers already successfully employed in the 
sheltered situation. Although both comparisons are 
valuable, there are vast differences between the two 
normative groups. It would be a mistake to mis¬ 
interpret normal performance according to the spe¬ 
cialized norms as average or normal performance 
overall. 

Pluralistic norms for members of ethnic or racial 
minorities are a type of special norms that have been 
advocated by such people as Mercer (1976). They are 
designed to help examiners avoid the pitfalls of bias 
purported to be present in norm-referenced assess¬ 
ment devices, particularly in tests of intelligence or 
aptitude. Although there may be some productive 
uses for norms based on the performance of ethnic and 
racial subgroups, researchers such as Sattler (1988) 
and Jensen (1980) believe that this is a potentially 
dangerous practice which could result in lower expec¬ 
tations for these individuals, particularly if it were 
not made clear which norms (national or pluralistic) 
were the basis of subsequent interpretations. In addi¬ 
tion, Duffey, Salvia, Tucker, and Ysseldyke (1981) 
point out that, “There is far more evidence that the 
use of test data has been biasing rather than the tests 
themselves” (p. 433) (emphasis added). 

Brown and Bryant (1984) suggest that there are 
four- instances in which local or specialized norms may 
be appropriate. The first occurs when the performance 
of the subgroup in question is known to be vastly dif¬ 
ferent from the performance of a test’s existing nor¬ 
mative sample. In general, the mean of the subgroup 
should be at least 1-114 standard deviations away 
from the mean of the normative group before the dif¬ 
ference is considered to be substantial enough to war¬ 
rant the development of special norms. TONI-2 users 
may encounter this situation on occasion. 

The second instance occurs when the test format 
or administration procedures are altered for the sub¬ 
group in question. “Changes that might necessitate 
the development of special norms include changes in 
.time allowed for test administration, changes in the 
language in which the test is administered, changes 
in the age group to whom the test is administered, 
or changes in criteria for correct and incorrect 
responses” (Brown & Bryant, 1984, p. 54). For 
instance, some TONI-2 examiners may decide to add 
verbal instructions to the test administration; or, they 
may want to administer the test to a large group of 
research subjects and, for convenience, decide to pro¬ 
ject the items onto a large screen and ask respondents 
mark an answer sheet or use a computerized response 
system; or, they may decide to administer the test to 
very bright four year olds. It is possible that some of 


these changes would not significantly alter perform¬ 
ance on the test. For this reason, an equivalency study 
should be conducted first. If the changes do affect per¬ 
formance, then special norms are appropriate; if not, 
then the existing norms are quite suitable. 

The third situation that may call for the develop¬ 
ment of special norms is when the test being used does 
not have an appropriate set of norms to begin with. 
According to Brown & Bryant (1984), “These situa¬ 
tions are' rare. At least one well built, nationally 
normed test is available for most areas of interest to 
special educators” (p. 64). This problem will not con¬ 
front users of the TONI-2. 

The fourth and final instance in which special 
norms may be deemed appropriate occurs when test 
performance is tied to a specific educational program, 
such as a designated curriculum, a special class place¬ 
ment, or a particular instructional arrangement. It 
is unlikely that a test such as the TONI-2 will be used 
for this purpose. 

Norms based on a representative national sam¬ 
ple always will be appropriate. Additional norms 
based on specialized populations may be appropriate 
in some situations. We encourage users of the TONI-2 
to develop local or specialized norms when they seem 
to be suitable and useful. We also caution them to 
state explicitly which set or sets of norms were applied 
in any interpretation. Readers who are interested in 
developing local or specialized norms should consult 
Brown and Bryant (1984) or Svinicki and Tombari 
(1981). Both of these sources provide step-by-step 
instructions for developing specialized norms and 
clear guidelines for using them. 


Sharing the Results 

The results of the TONI-2 should be shared with 
the people who are legally entitled to receive such 
information. This may include the subject who took 
the test, the parents or guardians of minor subjects, 
and other involved professionals. Unfortunately, shar¬ 
ing test results is not as easy as it sounds! Some 
guidelines may be helpful. 

First, whenever possible, the results of the 
TONI-2 should be reported along with other 
appraisal data such as test scores; observation data; 
and information obtained in interviews, at confer¬ 
ences, and through consultation. It is fruitless and 
sometimes even misleading to report an isolated test 
score or piece of data. Interpretations and recom¬ 
mendations also should accompany test scores and 
appraisal data. Space is provided on the TONI-2 
Answer Booklet and Record Form to record such 
comments. 


25 




Second, the examiner who reports the results of 
the TONI-2 should be thoroughly familiar with the 
format, the content, and the psychometric character¬ 
istics of the TONI-2 as well as with the characteris¬ 
tics of the scores that are reported. The people with 
whom examiners share TONI-2 information will 
expect the examiner to know this information and to 
be able to interpret it in relation to test performance. 

Third, it is a temptation, or perhaps a habit, to 
lace psychological reports with professional jargon. 
A sincere effort on the part of the examiner to con¬ 
vey information in clear, familiar language will be 
appreciated. Rafoth and Richmond (1983) and Shel- 
lenberger (1982) both provide some helpful guidelines 
for the use of plain language in reports. 

Fourth, remember that most parents who have 
consented to intellectual assessment for their children 
are concerned about the outcome. When they come 
to a conference to discuss test results, they usually 
have some specific goals and expectations of their own 
in mind. Professionals who present assessment data 
to these parents need to address the parents’ concerns 
in addition to presenting them with the facts. 

Fifth, the subjects who have been the object of the 
evaluation often remain just that—objects and not 
participants. This is particularly true of children and 
adolescents who take the test. Most of these people 
are curious, if not outright anxious, about their test 
performance. Any recommendations for further eval¬ 
uation or intervention should be shared and discussed 
with the target subjects, too. 

All of this said, one should note that privacy and 
confidentiality are hallmarks of the appraisal process. 
Maintaining appropriate safeguards for privacy and 
confidentiality while sharing psychological records 
and reports is a thorny legal and ethical issue, espe¬ 
cially in the schools. Examiners should be fully 
informed about their responsibilities in these matters. 
A few noteworthy safeguards follow. 

As a confidential document, a psychological report 
may be made available only to those individuals who 
have the written consent of the responsible individ¬ 
ual. When children are being evaluated, the respon¬ 
sible individual usually is their parent or legal 
guardian. Emancipated minors and responsible indi¬ 
viduals of legal age may act for themselves. Accord¬ 
ing to the provisions of The Family Educational 
Rights and Privacy Act of 1974 (PL 93-380) and The 
Privacy Act of 1974 (PL 93-579), these responsible 
individuals—that is, parents, legal guardians, and 
individuals of legal age—have the right to review 
educational records and official files and documents 
concerning themselves and their children or wards. 
Minors generally do not have any right to confiden¬ 
tiality apart from their parents or guardians. The 
legally responsible individual also has the right to 


request that information contained in files be altered 
or eliminated if it is incorrect or violates their right 
to privacy or confidentiality. 

It is unclear whether the test protocols them¬ 
selves, such as the TONI-2 Answer Booklet and 
Record Form, must be provided for review upon 
request. Such records probably are considered acces¬ 
sible in most instances, particularly in the schools. 
In some very limited instances, such as cases of sus¬ 
pected child abuse, a mental health professional 
legally may withhold information, but that is not 
likely to affect the sharing of TONI-2 results. 

Confidentiality is a time-honored bond between 
professionals and their clients. However, the extent 
to which this bond applies to intellectual and psycho¬ 
logical assessment is not clear. Examiners should be 
guided by the rules and regulations of their own 
school district, agency, professional licensing board, 
or other regulatory group. In addition, they may find 
Principle 5 of the American Psychological Associa¬ 
tion’s (1981) Ethical Principles of Psychologists (which 
addresses confidentiality), the procedural guidelines 
offered by Hollis and Donn (1979) or Zins and Barnett 
(1983), and Knoffs (1983) discussion to be of special 
interest. 


Caveat CJtilitor 

Tests, particularly tests of the highly standard¬ 
ized, norm referenced variety, have been elevated to 
a precariously high pedestal in the appraisal profes¬ 
sion. Diagnosticians and psychologists have begun to 
shy away from the use of professional judgment and 
to place increasing emphasis on test scores. This is 
an unfortunate practice. Tests are of no more or no 
less consequence than any other appraisal technique: 
the analysis of specified products, observation, inter¬ 
views, diagnostic teaching, or even a simple review 
of existing records. Ultimately all of these tech¬ 
niques—tests included—are only as good as the per¬ 
son who uses them. 

We have taken great care to produce an extremely 
well built, psychometrically sound instrument. As 
test authors, we set high standards for the TONI-2, 
and we labored to achieve those standards. For the 
most part, we succeeded. Still the TONI-2 cannot 
diagnose. Diagnosis is an interactive, on-going pro¬ 
cess of hypothesis testing. The administration of tests 
is a mechanical part of the process and it is the inter¬ 
pretation of the results that leads to diagnosis. The 
question must be: What caused this particular score 
or pattern of test scores to occur? 

Sophie earned a quotient of 60 on the TONI-2. We 
must ask why. Is this because Sophie is mentally 


26 



retarded? Perhaps. It also could be because Sophie 
snorted a controlled substance just before the test ses¬ 
sion, because she can't read, because she didn’t under¬ 
stand the instructions, because she doesn’t care, 
because her dog just died. The list of possible explana¬ 
tions for poor test performance goes on and on before 
one even begins to consider reasons outside the child 
that might explain the test 6core. The test itself could 
be unreliable or invalid, incorrectly administered, 
inaccurately scored, or totally inappropriate to the 
situation. The examiner could have been inexperi¬ 
enced, inept, uncaring, or simply hassled and frazzled. 

Why did Sophie score 60 on the test? That is not 
a psychometric question. It cannot be answered by a 
test.. Only a skillful and intuitive diagnostician can 
provide an answer. Caveat utilitor. 


One final caveat for TONI-2 users. Some readers 
may find our title, the Test of Nonverbal Intelligence, 
presumptuous. Intelligence is a complex and multi¬ 
dimensional construct, and it is true that the TONI-2 
measures only a small sliver of that construct. We 
recognize that it will not usually be a reasonable 
substitute for broad-based tests of intelligence or 
aptitude. But that begs the question somewhat. Can 
any test measure intelligence fully? Perhaps all tests 
including the word “intelligence” in their titles are 
a trifle pretentious. Because we recognize the pre¬ 
sumption inherent in our title, we have been conser¬ 
vative in our claims for what the the TONI-2 can do. 
We make no claims that cannot be substantiated 
through data and we repeatedly caution examiners 
against overgeneralization of the TONI-2 results. 



4 

Developing the TONI—2 


This chapter contains complete descriptions of the 
psychometric procedures that were used to build the 
original TONI in 1982 and to revise it this year as 
the TONI-2. The procedures by which we accumu¬ 
lated a pool of potential test items and then built the 
two forms of the test are set forth, We describe in 
detail the data“collcction procedures that were 
employed and the statistical analyses that were 
undertaken to create the TONI-2 normative tables. 
A diverse set of empirical studies designed to estab¬ 
lish the reliability and validity of the test also are 
provided. 

Specifically, this chapter includes discussions of 
(a) item analysis and selection procedures, (b) stan¬ 
dardization and normative procedures, (c) procedures 
to estimate the test’s reliability, and (d) procedures 
to demonstrate its validity. 


Item Analysis and Selection 

There are several commonly accepted methods for 
building tests. We adopted a traditional approach, 
beginning by accumulating a pool of potential items 
through the process that Anastasi (1988) refers to as 
content validation. By applying empirical criteria for 
item selection and retention, we ultimately selected 
from this pool the 100 items that comprised Forms 
A and B of the original TONI. These same procedures 


were repeated when we developed and selected the 
ten items added during the revision of TONI-2. 
Creating a large item pool had the net effect of ensur¬ 
ing that the items covered, quite comprehensively, the 
content that we intended the test to measure. 

When selecting items to be added to the original 
TONI to form the TONI-2, a pool of 23 potential items 
was developed. These items were administered to 50 
subjects enrolled in undergraduate courses at Baylor 
University in Waco, Texas. Very bright students were 
selected for this study because we wanted to ensure 
that the retained items were both valid and of suffi¬ 
cient difficulty to be placed at the end of each form. 
That is, the new items needed to be more difficult 
than the items on each form of the original TONI. 
Following item discrimination and difficulty analy¬ 
ses, we retained 10 items, five items for each form of 
the TONI-2. The empirical selection criteria had the 
added benefit of ensuring, to some extent, the relia¬ 
bility and validity of the instrument. Anastasi (1988) 
refers to the combination of these two procedures as 
‘‘a satisfactory compromise” (p. 202) to the dilemma 
facing test authors. They are described in consider¬ 
able detail in the succeeding sections. 

Content Validation 

Content validation was the first step we took to 
build validity into the TONI-2 at the earliest possi- 



ble stage. These procedures ensured that the test 
items were representative of the construct that we 
intended to measure and that the items were con¬ 
sistent with the most current theoretical assumptions 
concerning the evaluation of aptitude and intelli¬ 
gence. 

Content validation can be accomplished in several 
ways. The most common means are building items 
based on curricular materials and instructional man¬ 
uals, canvassing the relevant body of literature, 
examining the content and format of appraisal tools 
having similar objectives, and consulting knowledge¬ 
able professionals in the area about their ideas and 
concepts. With the exception of extracting items from 
curricular and instructional materials, a procedure 
that is irrelevant for a test like the TONI-2, we 
employed all of these techniques to one degree or 
another. 

We were particularly interested in the contents 
and formats used in other nonverbal and performance 
tests of intelligence. Among the tests we studied most 
closely were Raven’s Standard Progressive Matrices 
(1938,1960) and Coloured Progressive Matrices (1947), 
The Leiter International Performance Scale (Leiter, 
1948), and the performance subtests of the various 
Wechsler scales (Wechsler, 1981, 1974, 1967). 

In the literature, we were interested in the early 
work of Guilford (c.f., 1956a) in testing for the pro¬ 
duction aspects of his Structure of Intellect Model. 
Experimental psychologists such as Bourne (1963, 
1967), Bourne & Guy, 1968a, 1968b), Dominowski 
(1966), Gagne (1959), and Glucksberg (1964), noted for 
their work in the area of problem solving, also were 
particularly helpful. We were also interested in 
Gardner's (1983) theory of multiple intelligences. 
A complete discussion of the theoretical positions 
that guided the development of the TONI-2 and its 
items is found in the first chapter of this manual. 

Following this review, we began to construct the 
first version of the TONI. We devised 307 problem 
solving tasks in which the content was not symbolic 
and had no inherent meaning. This rather daunting 
pool of abstract/figural problem solving tasks then 
was submitted to a panel of university professors, 
psychologists, psychometrists, and special educators 
for critical review. Items that these professionals 
deemed to be ambiguous, symbolic, or linguistic in 
any way were eliminated. One hundred eighty-three 
items remained following this review and they were 
subjected to the statistical analyses described in the 
next section. 

When we began to built the second edition of the 
TONI, we decided that the test would benefit from the 
addition of some more difficult items. Following these 
same content validation procedures, we constructed 


an additional 23 items. These, too, were subjected to 
empirical item analytic procedures. 

Empirical Item Selection Criteria 

Items for the original version of the TONI and the 
additional items for the TONI-2 were chosen on the 
basis of specific data-based characteristics. There is 
a rather intimidating array of item analytic proce¬ 
dures in the bag of tools available to a test builder. 
Oosterhof (1976) investigated over 50 different item 
analytic procedures and discovered that they all pro¬ 
duced remarkably analogous results. Armed with this 
bit of information, we elected to use two traditional 
indices that are very simple to compute but that also 
yield powerful results: item discrimination coeffi¬ 
cients and item difficulty percentages. 

Item Discrimination. Item discrimination coeffi¬ 
cients for the TONI were calculated by correlating the 
response to each item with the total score of the test. 
These item-to-total correlations indicate that the 
items of a scale are all measuring the same construct. 
This procedure not only reflects on the validity of the 
instrument, but it also virtually ensures the homo¬ 
geneity and internal consistency of the final test. 

These item-to-total correlations are called item 
discrimination coefficients because they indicate the 
degree to which the item discriminates between sub¬ 
jects who are good at a particular skill or who have 
an abundance of the trait being measured and those 
subjects who are less skillful or less well endowed. An 
item is called discriminating when it is answered cor¬ 
rectly by subjects who subsequently earn high total 
scores on the test and is answered incorrectly by sub¬ 
jects who earn low scores. 

Items with discrimination coefficients that are 
statistically significant and that range in size 
between .30 and .80 were retained for both editions 
of the TONI. Statistical significance portends a strong 
probability that there is a nonrandom relationship 
between the item and the total test score. The .30 base 
criterion for magnitude guarantees that each item is 
making a meaningful contribution to the test; the .80 
ceiling further ensures that each item is making a 
unique contribution to the test and is not duplicating 
information already provided by the other test items 
(Anastasi, 1988; Guilford, 1956). 

Item discrimination coefficients were calculated 
for all of the items in the original TONI pool based 
on the performance of 322 subjects grouped into eight 
age or grade intervals: (a) kindergarten students, (b) 
first grade students, (c) third grade students, (d) fifth 
grade students, (e) seventh grade students, (f) ninth 


30 



grade students, (g) young adults ages 18-35 years, 
and (h) older adults ages 65-85 years. A total of 100 
items were tentatively selected on the basis of this 
preliminary analysis and the test was reconstructed 
as necessary. Fifty items were assigned to Form A and 
50 to Form B, taking care to ensure that the discrim¬ 
inating powers associated with the items on both 
forms were approximately equal. 

A final confirmatory analysis was completed 
based on the performance of 900 subjects grouped into 
18 age intervals. Fifty subjects were selected from the 
normative sample at each age level (5-19 years) and 
age cluster (20-29 years, 30 -49 years, 50-85 years). 
The new TONI-2 items were administered to 443 sub¬ 
jects in the older age ranges of 14.6-85.0 years; it was 
reasoned that very young children would not have 
reached these difficult items when they took the test. 
The median item-to-total correlation coefficients are 
reported at various age intervals in Table 4.1. All of 
the medians reported in Table 4.1 are significant at 
the .05 level of confidence or beyond, all exceed the 
.30 base criterion, and all but three are below the .80 
ceiling. 

Item Difficulty. Item difficulty is the percentage of 
test subjects who pass an item. An easy item will be 


TABLE 4.1 

Median Item-Total Correlations at Various Ages 
(Decimals Omitted) 


Age In 
Years 

Form A 

Form B 

5 

63 

66 

6 

37 

63 

7 

64 

58 

8 

56 

44 

9 

55 

54 

10 

52 

54 

11 

63 

57 

12 

60 

57 

13 

58 

61 

14 

60 

65 

15 

67 

65 

16 

57 

69 

17 

64 

50 

18 

63 

62 

19 

62 

67 

20-29 

73 

79 

30-49 

81 

84 

50-85 

75 

81 


answered correctly by a large number of individuals, 
while a difficult item will be answered correctly by 
only a few people. If 100% of the subjects tested give 
the correct answer to an item, then that item is not 
a very valuable one; high performing subjects and low 
performing subjects alike were able to pass it, so the 
item is not a successful discriminator. The same 
would be true of an item answered correctly by vir¬ 
tually no one; even the highest scoring subjects could 
not pass it, rendering the item relatively useless as 
a discriminator. 

Anastasi (1988) suggests that a wide range of item 
difficulty is desirable. She suggests a range of approx¬ 
imately 15% to 85% difficulty with an average around 
50%. We adopted this range for the selection of 
TONI-2 items. 

The subjects for this analysis were the same sub¬ 
jects described earlier for the confirmatory item 
discrimination study of the 100 original TONI items 
and the 10 more difficult items added during the 
TONI-2 revision. The median item difficulties are 
reported in Table 4.2. All of them meet our estab¬ 
lished criteria. Items were reordered for the TONI-2 
revision based on percentage of difficulty. Items with 
zero variance were deleted from the analysis at 
specific age levels. 

TABLE 4.2 

Median Item Difficulty Percentages 
at Various Ages 
(Decimals Omitted) 


Age In 
Years 

Form A 

Form B 

5 

52 

16 

6 

42 

24 

7 

06 

30 

8 

24 

42 

9 

24 

54 

10 

50 

36 

11 

27 

25 

12 

08 

25 

13 

26 

34 

14 

32 

32 

15 

51 

28 

16 

35 

64 

17 

44 

36 

18 

62 

51 

19 

54 

60 

20-29 

62 

77 

30-49 

44 

44 

50-85 

32 

39 


31 




The goal of item analysis was to produce two valid 
and reliable forms of the test containing as few items 
as possible. Why have a 200 item test when half that 
number of items may prove to be equally discriminat¬ 
ing and require substantially less administration 
time? The 110 items that make up the TONI-2 satisfy 
the empirical criteria that we established at the 
outset for item discrimination coefficients and item 
difficulty percentages. Thus, all of the TONI-2 items 
not only derive from an appropriate and accepted body 
of literature but also survived the subjective scrutiny 
of a professional review panel and the objective 
scrutiny of statistical analysis. 


Standardization and 
Normative Procedures 

The TONI-2 is a highly standardized, norm ref¬ 
erenced instrument. This section of the manual doc¬ 
uments those characteristics. In particular, the 
discussion includes the standardization of the test, the 
demographic features of the normative sample, and 
the types of normative scores that the test generates. 


Standardization 

Standardization, as it is used in this test manual, 
has a very specific definition. According to Hammill 
(1987), the adjective standardized implies three things 
about an appraisal technique. First, it suggests that 
there is a consistent (i.e., standard) method for using 
the procedure, in this case for administering the test. 
Armed with a standard set of administration instruc¬ 
tions, examiners are likely to give the test in the same 
way every time they use it. Chapter 3 of this manual 
contains step-by-step administration instructions, the 
same instructions that were provided to the people 
who administered the test to the normative sample. 
Examiners who adhere to the prescribed TONI-2 
administration protocol can be confident that testing 
error or variance due to differences in administration 
procedures has been reduced as much as possible, the 
principal goal of standardization. 

The second characteristic of a standardized instru¬ 
ment is an objective (i.e., standard) system for scor¬ 
ing it. The purpose, once again, is to guarantee that 
most examiners will compute the same score or scores 
under similar conditions. The recommended scoring 
procedures for the TONI-2 are specified in Chapter 
3. The fact that these are the directions that were 
used to score the TONI-2 during norming and that 


all examiners who read this manual have the same 
set of directions to follow reduces another potential 
source of testing error. 

The third characteristic of a standardized test is 
a specified frame of reference (i.e., standard) for inter¬ 
preting the test results. For the TONI-2, we have 
chosen a normative frame of reference to permit 
examiners to compare the performance of a test sub¬ 
ject to the performance of agemates in the normative 
sample. By providing easy-to-use normative tables in 
the Appendix and specific guidelines for interpreting 
standard scores in Chapter 3, we have increased the 
probability that examiners will reach comparable 
conclusions when they interpret the results of the 
TONI-2, thereby reducing another external source of 
test error. 

Hammill (1987) continues his discussion of stan¬ 
dardization by pointing out that the developers of 
tests or other appraisal techniques cannot know how 
successful they have been at standardizing their pro¬ 
cedures until they study its reliability and validity. 
He says that empirical reliability and validity are the 
proof the method is working. The last two sections of 
this chapter, those devoted to reliability and validity, 
provide the requisite proofs of the TONI-2’s standard¬ 
ization. In the meantime, since we have selected a 
normative frame of reference for interpreting the 
TONI-2, we will describe next the demographic char¬ 
acteristics of the normative sample and the nature 
of the normative scores that are reported. 


Demographic Characteristics of the 
Normative Group 

The TONI-2 normative sample includes 2,764 
subjects ranging in age from 5-0 through 85-11 years 
and residing in 30 states: Alabama, Alaska, Arizona, 
California, Connecticut, Florida, Georgia, Hawaii, 
Idaho, Illinois, Iowa, Kansas, Kentucky, Maryland, 
Massachusetts, Michigan, Montana, Nebraska, New 
York, North Carolina, Ohio, Oregon, Pennsylvania, 
South Dakota, Texas, Utah, Virginia, Washington, 
West Virgina, and Wisconsin. Of these subjects, 68% 
were tested in the Spring and Fall of 1981 and the 
remaining 32% were tested in the Spring and Fall of 
1989. 

Subjects were selected randomly to participate in 
the norming of the TONI-2, and no subjects within 
the designated age range were excluded from testing 
except those diagnosed as having profound intellec¬ 
tual disabilities. Subjects were tested individually or 
in small groups of less than five people. Trained exam¬ 
iners followed the administration protocol outlined in 


32 



Chapter 3 throughout the normative testing period. 
Most subjects took both Form A and Form B. The 
order of administration was randomized, and the sec¬ 
ond form was given immediately after the first. 

Table 4.3 describes the known demographic char¬ 
acteristics of the normative sample. Subject age, 
gender, predominant race, ethnicity, geographic loca¬ 
tion, domicile, and principal language spoken in the 
home are specified. We also identify the educational 
and occupation 6tatus of the adult subjects in the 
normative sample or of the parents of children in the 
sample. 

To help TONI-2 users determine if the sample 
adequately represents the reference population, 
Table 4.3 also includes comparative statistics for the 
U.S. population. These comparative data were 
extracted from the Statistical Abstract of the United 
States (1985). In most instances, the percentages 
reported for the normative sample are similar' to those 
reported for the reference group, indicating that the 
members of the TONI-2 normative sample do 
resemble their peers across the United States and 
therefore constitute a representative sample of the 
population. 

As a further test of the geographic representa¬ 
tiveness of our sample, we computed multivariate 
analyses of variance among the mean scores from the 
four geographic regions at each age interval for which 
we report normative scores. The F-ratios were not 
significant. 

The size of a normative sample is as important 
as its representativeness. In A Consumer's Guide to 
Tests in Print, Hammill, Brown, and Bryant (1989) 
write that ideally the normative sample should 
include 1,000 or more subjects. Clearly, the TONI-2 
normative group (N = 2,764) meets this criterion. In 
addition, psychometrists contend that there should be 
at least 75 or more subjects within each one year age 
interval to ensure sufficient dispersion of raw scores 
for the computation of percentile ranks and standard 
scores. "The . . . exception to these size guidelines,” 
according to Consumer’s Guide, "is a test that extends 
,well into the adult years .... These tests were not 
penalized ... if they met the relevant criteria for the 
school-aged population and if they demonstrated that 
the age groupings employed for the adult population 
were derived through empirical means," (Hammill, 
Brown, & Bryant, 1989, p. 17). Table 4.3 indicates 
that the specification of 100 subjects per one year age 
interval for the school age population has been met. 
In addition, there are 100 or more subjects for each 
of the adult age intervals, intervals that were estab¬ 
lished empirically according to the procedures out¬ 
lined in the next section for developing the TONI-2 
normative tables. 


Normative Scores 

Normative scores provide a meaningful way for 
examiners to report and interpret performance on a 
norm-referenced instrument. The raw scores earned 
by the members of the normative sample are trans¬ 
formed into a normalized distribution and may be 
reported in any one of several standard score distribu¬ 
tions. Two kinds of normative scores are used on the 
TONI-2, deviation quotients and percentile ranks. 
These scores will be described here. Earlier, in 
Chapter 3, guidelines were suggested for interpreting 
them. 

There are no age or grade equivalent scores avail¬ 
able for the TONI-2. We concur with the various pro¬ 
fessional organizations (e.g., American Psychological 
Association, 1985; International Reading Association, 
1980) and psychometrists (e.g., Aiken, 1988; Salvia 
& Ysseldyke, 1988) who discourage the use of these 
scores. Space does not permit a lengthy discussion of 
the difficulties associated with age and grade equiv¬ 
alent scores in this manual, but we can paraphrase 
Hammill, Brown, & Bryant (1989) who note that 
"their use promotes a concept of normal performance 
that is inaccurate and misleading, . . . [they may be] 
generated for an age or grade level that may not 
actually have been tested, . . . [they imply) an even 
rate of development or change in the attribute being 
measured, .. . [and they) are ordinal scales... [which) 
hinders the direct comparison or profiling of scores” 
(p. 7). 

Percentile Ranks. Percentile ranks are useful and 
easily understood scores that can be computed for 
both forms of the TONI-2. Percentile ranks enjoy 
great popularity in educational and psychological 
circles because they can be readily interpreted to 
parents and students as well as to professionals. They 
indicate, quite simply, the percentage of subjects who 
earned scores above or below a particular point. If a 
raw score converts to a percentile rank of 40, for 
instance, it indicates that 40% of the normative sam¬ 
ple earned raw scores of the same magnitude or lower, 
and that 60% scored above that level. A disadvantage 
of percentile ranks is that they are ordinal data and 
the distance between points is not equal. For instance, 
the actual difference in performance indicated by the 
distance between percentile ranks of 5 and 10 is much 
greater than the difference in performance indicated 
by the same 5-point distance between percentile ranks 
of 95 and 100. 

Percentile ranks are derived directly from the raw 
score distribution of the normative sample. In the 
development of the TONI-2, raw score means and 
standard deviations were computed for each of the two 
forms at every six-month age interval between 5-0 


33 




TABLE 4.3 

Demographic Characteristics of the TONI-2 Normative Group 


Characteristics 

%age of 
Sample 

%age of U.S. 
Population 

Age 



5 years (N = 106) 

3.8 


6 years (N = 185) 

6.7 


7 years (N = 138) 

5.0 


8 years (N = 154) 

5.6 


9 years (N = 187) 

6.8 


10 years (N = 244) 

8.8 


11 years (N = 230) 

8.3 


12 years (N = 162) 

6.9 


13 years (N = 166) 

6.0 


14 years (N = 100) 

3.6 


15 years (N = 117) 

4.2 


16 years (N = 81) 

2.9 


17 years (N = 80) 

2.9 


18 years (N = 82) 

3.0 


19 years (N = 85) 

3.1 


20-25 years (N = 193) 

7.0 


26-49 years (N = 241) 

8.7 


50-85 years (N = 213) 

7.7 


Gender 



Male 

47.0 

48.78 

Female 

53.0 

51.32 

Predominant Race 



Caucasoid . 

80.0 

84.74 

Negroid 

12.0 

12.16 

Mongoloid 

8.0 

3.10 

Ethnicity 



Anglo/European 

72.0 

75.91 

Black American 

12.0 

11.89 

Hispanic 

7.0 

6.52 

Oriental/Pacific Islander 

7.0 

3.54 

Native American/Eskimo/Aleut 

1.0 

1.41 

All Others 

1.0 

0.73 

Geographic Region of Residence 



Northeast 

17.0 

21.08 

North Central 

27.0 

25.81 

South 

38.0 

34.30 

West 

18.0 

18.81 

Domicile 



Urban and Suburban 

74.0 

78.05 

Rural 

26.0 

22.05 

Educational Attainment of Parents of Minor Subjects 



1 Completed 0-11 grades 

12.0 

15.67 

Graduated high school 

50.0 

49.21 

Attended college or technical school 

17.0 

19.30 

Graduated college 

13.0 

10.51 

Completed postgraduate training 

8.0 

5.31 

Educational Attainment of Adult Subjects 



Completed 0-11 grades 

15.0 

15.67 

Graduated high school 

46.0 

49.21 

Attended college or technical school 

16.0 

19.30 

Graduated college 

15.0 

10.51 

Completed postgraduate training 

9.0 

5.31 



and 18-11 years, every one year interval between 19-0 
and 24-11 years, and every five year interval between 
25-0 and 85-11 years. When adjacent means were not 
significantly different, the age intervals were com¬ 
bined. This process was repeated until the final age 
groups for the two normative tables were identified. 
The raw score distributions for each age interval then 
were transformed into the desired percentile distribu¬ 
tion. 

Table A in the Appendix contains the percentile 
ranks for TONI-2 Form A; Table B contains that 
6ame information for Form B. 

Deviation Quotients. Deviation quotients are a 
type of standard score that can be reported for both 
Form A and Form B of the TONI-2. Standard scores 
are expressed in standard deviation units to indicate 
a score’s distance from the mean or average score of 
the normative sample. There are several common 
standard score distributions, each with a specified 
mean and standard deviation. For instance, T-scores 
have a mean of 50 and a standard deviation of 10, 
z-scores have a mean of 0 and a standard deviation 
of 1, stanines have a mean of 5 and a standard devia¬ 
tion of 1.96, and deviation quotients have a mean of 
100 and a standard deviation of 15. 

Standard scores are especially useful because they 
are interval data. This means that the distance 
between scores is equal and that standard scores can 
be added together, averaged, and otherwise compared 
directly or acted on arithmetically. 

The distribution for the TONI-2 deviation quo¬ 
tients has a-mean of 100 and a standard deviation of 
15. Like percentile ranks, standard scores are derived 
directly from the raw score distribution of the nor¬ 
mative sample. We followed the same procedure 
described in the previous section. Raw score means 
and standard deviations were computed for each of 
the two forms at every six-month age interval 
between 5-0 and 18-11 years, every one year interval 
between 19-0 and 24-11 years, and every five year 
interval between 25-0 and 85-11 years. Age intervals 
were combined when adjacent means were not signif¬ 
icantly different. The final raw score distributions for 
each age interval then were transformed into the 
desired quotient distribution. 

Table A in the Appendix contains the deviation 
quotients for TONI-2 Form A. Table B contains that 
same information for Form B. 


Test Reliability 

Earlier in this chapter, we described the three 
qualities that Hammill (1987) said were characteris¬ 
tics of standardized assessment procedures, namely, 


specific procedures for administering the instrument, 
for scoring the obtained responses, and for inter¬ 
preting the results. The TONI-2 has all of these 
qualities and therefore can be called standardized. To 
prove the test’s standardization, though, it is neces¬ 
sary to provide the two empirical proofs of standardi¬ 
zation, reliability and validity. Reliability, the first 
of these two proofs, is the subject of this section. A 
discussion of validity will follow. 

Reliability is studied to determine the amount of 
error variance that is inherent within a particular 
appraisal technique. No measurement system—not a 
ruler, the bathroom scales, or even an atomic clock—is 
completely accurate. Tests in particular are fraught 
with sources of potential error, which their authors 
strive to reduce through the standardization of the 
instrument. By providing examiners with precise 
administration procedures, with exact scoring 
criteria, and with an explicit frame of reference for 
interpreting test scores, a substantial portion of error 
can be eliminated. 

Two other types of test error can be estimated 
through empirical investigations of reliability. The 
first type, content sampling error, arises from the con¬ 
tent of the test itself and can be measured through 
internal consistency reliability studies. The second 
type, time sampling error, results from the incursion 
of time and is measured by studies of stability relia¬ 
bility. 

Reliability is an extremely important concept and 
reliability coefficients serve as an index of a test’s 
utility. Unreliable tests include a significant amount 
of error which in turn produces a certain fickleness 
or inconsistency in the test’s results. No examiner can 
feel particularly comfortable or confident relying on 
error-laden and therefore unstable data to make deci¬ 
sions about the diagnosis or placement of a child or 
about the progress attributed to particular interven¬ 
tion strategies. 

We have been careful to document that the relia¬ 
bility of the two forms of the TONI-2 is sufficient to 
warrant the use of the instruments. The results of 
internal consistency and stability reliability studies 
are reported in succeeding sections, along with inves¬ 
tigations of the instrument’s reliability in regard to 
special populations. 

Before we present reliability data on the TONI-2, 
though, it is necessary to discuss the nature of relia¬ 
bility coefficients themselves. All of our studies either 
produce reliability coefficients or are based on relia¬ 
bility coefficients. How large must these coefficients 
be before they are accepted as evidence of the test’s 
reliability? 

Kelley’s (1927) recommendations were the fore¬ 
runners of modern reliability standards. He recom¬ 
mended magnitudes ranging from .50 for tests that 


35 




would be used to measure group performance up to 
a nearly perfect .98 for tests that would be used to 
measure differences among several test performances 
by the same individual. Guilford (1956b), Helmstadter 
(1964), and others have argued that requiring coeffi¬ 
cients this large was not practical or reasonable, and 
they settled on standards of .70 or .80 and higher for 
most tests. Salvia and Ysseldyke (1988) believe that 
coefficients in the .80s are sufficient for screening pur¬ 
poses. Most psychometrists, though, advocate higher 
levels of reliability when test results will be used to 
draw conclusions or to make judgments about an indi¬ 
vidual test performance; Guilford, for instance, calls 
for a .94 level of reliability for this purpose, Nunnally 
(1978) and Salvia and Ysseldyke (1988) look for .90s, 
and Aiken (1985) recommends .85. In A Consumer’s 
Guide to Tests in Print, Hammill, Brown, and Bryant 
(1989) conclude that .80 is the minimum acceptable 
level of reliability, while the more rigorous .90 is 
preferred. 

The Consumer’s Guide criteria are the ones that 
we have chosen to adopt for the TONI-2, a test that 
will be used primarily for screening and as a compo¬ 
nent in a comprehensive diagnostic appraisal battery. 
Statistically significant reliability coefficients in the 
.80s and .90s will be accepted as evidence of the 
TONI-2’s internal consistency reliability when appro¬ 
priate age controls are applied. 

Internal Consistency Reliability 

*» 

Content sampling error is estimated through 
studies of internal consistency reliability. The content 
of a test is its items, and internal consistency relia¬ 
bility is concerned with the degree of interrelation¬ 
ship or intercorrelation among the test’s items. The 
goal of test construction is to build a test or a scale 
in which the items all measure the same construct 
or content. For instance, the TONI-2 is intended to 
measure one very specific and discrete aspect of intel¬ 
ligence behavior, namely precision at solving prob¬ 
lems with abstract/figural content. Items that 
measure other types of intelligent behaviors would 
only muddy the conceptual waters if they were 
included on the TONI-2. 

If the items on a particular scale turn out to be 
unrelated, then they probably are not measuring the 
same construct, in which case the content sampling 
error of the scale will be quite large and its internal 
consistency reliability coefficients will be correspond¬ 
ingly low in magnitude. If, on the other hand, the 
items are significantly intercorrelated and appear to 
be homogeneous, then the content sampling error will 
be relatively low and internal consistency reliability 
will be high. 


Researchers and test authors have several sta¬ 
tistical procedures to choose from when they under¬ 
take studies of internal consistency reliability. The 
most common procedures are the Spearman-Brown 
corrected split-half technique, Hoyt’s procedure, the 
various Kuder-Richardson formulae, the coefficient 
Alpha formula, and the immediate test-retest with 
alternate forms method. The last two techniques were 
used to analyze the reliability of the TONI-2 and 
those results are described next. 

Readers are reminded that the item discrimina¬ 
tion coefficients reported earlier in Table 4.1 presaged 
the internal consistency reliability of the test. One 
criterion for item selection was the item-to-total cor¬ 
relation. This particular criterion enhances the homo¬ 
geneity of a test or a scale and therefore increases the 
internal consistency reliability of the measures. 

Coefficients Alpha. Cronbach’s (1951) coefficient 
Alpha yields the average of all the possible split-half 
correlations that can be extracted from a test. It is 
one of the most rigorous of the procedures used to esti¬ 
mated an instrument’s internal consistency reliabil¬ 
ity. Coefficients Alpha are reported more frequently 
in test manuals now that high speed computers have 
made the complex calculations more practicable. 

The Alpha technique was applied to the 900 
TONI-2 protocols described earlier for the confirma¬ 
tory item analysis study. Coefficients Alpha for 
TONI-2 Forms A and B are reported in Table 4.4. One 
of the coefficients associated with Form A is .81. All 
of the remaining coefficients for both forms are in the 
.90s. The averaged Alphas reported in that table, .95 
and .96 respectively, were computed by the z-transfor- 
mation method to estimate the internal consistency 
of the two forms across all of the age intervals studied. 

Immediate Test-Retest with Alternate Forms. 
The principle underlying this procedure for estimat¬ 
ing internal consistency is similar to that of the split- 
half technique. Rather than using two halves of the 
same test, though, one uses two equivalent forms of 
the test. Performance on both forms of the test are 
intercorrelated, providing another estimate of content 
or internal consistency reliability. 

During the norming of the TONI-2, 2,110 mem¬ 
bers of the normative group took both forms of the 
test. Form A and Form B were administered back-to- 
back and the order of administration was randomized. 
Pearson product-moment coefficients of correlation 
were computed at each year from 5-19 years and at 
the age intervals 20-29 years, 30-49 years, and 50-85 
years. The resulting coefficients are reported in Table 
4.5. All of the coefficients are above .80 and four are 
in the .90s. The mean coefficient is .86. 


36 




TABLE 4.4 

Coefficients Alpha for the TONI-2 
at Various Ages 
(Decimals Omitted) 


TABLE 4.5 

Immediate Test-Retest with Alternate Forms 
Reliability Coefficients at Various Ages 
(Decimals Omitted) 


Age In 
Years 

Form A 

Form B 

6 

91 

94 

6 

81 

95 

7 

95 

94 

8 

94 

92 

9 

93 

94 

10 

93 

96 

11 

95 

96 

12 

97 

97 

13 

95 

95 

14 

95 

97 

15 

96 

97 

16 

96 

97 

17 

96 

96 

18 

97 

97 

19 

96 

97 

20-29 

98 

98 

30-49 

98 

99 

50-85 

97 

98 

Mean 

95 

96 


Stability Reliability 

Time sampling error is estimated through studies 
of stability reliability. Studies of this type of error are 
concerned with the degree to which a subject’s test 
performance remains stable or constant over time. 
The goal, obviously, is to build a test or a scale that 
will generate invariable results and that because of 
its stability can be accepted as a solid basis for 
diagnostic decisions. 

If a test is not stable, performance will vary con¬ 
siderably from one testing period to another. The time 
sampling error will therefore be quite large and the 
test’s stability reliability coefficients will be low in 
magnitude. If, on the other hand, test performance 
is relatively stable over time, then then time sam¬ 
pling error will be low and the test’s stability reliabil¬ 
ity will be correspondingly high. 

Obviously, stability reliability is confounded 
somewhat by internal consistency reliability. The 
results of test-retest studies reflect not only the time 
sampling error that is introduced by the method of 
the investigation, but also reflect the content sam¬ 
pling error that is inherent to the test. Theoretically, 


Age In 
Years 

N 

r ab 

5 

75 

80 

6' 

111 

86 

7 

89 

84 

8 

107 

85 

9 

138 

81 

10 

192 

85 

11 

184 

82 

12 

130 

85 

13 

139 

84 

14 

77 

81 

15 

89 

90 

16 

72 

86 

17 

71 

87 

18 

74 

90 

19 

80 

92 

20-29 

204 

85 

30-49 

128 

82 

50-85 

150 

95 

Total 

2110 


Mean 


86 


stability reliability cannot exceed internal consist¬ 
ency reliability for this reason. 

There are two principal means of studying stabil¬ 
ity reliability. The first is the delayed test-retest 
method in which a test is administered to the same 
group of students on two different occasions. Because 
of the problems inherent in this technique—recall of 
items, practice effect, and changes in test situations, 
to name only a few (Anastasi, 1988)—we chose to use 
the second procedure, the delayed test-retest with 
alternate forms method. 

When equivalent forms of a test are administered 
to the same subjects on two different occasions, it is 
possible to estimate both content and time sampling 
error. Anastasi (1988) recommends measuring the 
content sampling error of both forms of the test and 
then extracting that known error to estimate the time 
sampling error. Clearly, the actual interaction of time 
sampling and content sampling error cannot be deter¬ 
mined precisely. 

TONI-2 Form A was administered to 39 subjects, 
including 19 males and 20 females ranging in age 
from 7-9 to 15-9 years (M = 11.6). Form B then was 
administered 7 days later. The mean content sam- 





f iling error was determined to be .95 for Form A and 
96 for Form B using the coefficient Alpha procedures 
described earlier in the internal consistency relia¬ 
bility section of this manual (Table 4.4). Extracting 
these two figures using Anastasi’s method, we esti¬ 
mated that the stability reliability of the TONI-2 was 
'85. 

| 

Reliability with Special Populations 

} 

The TONI-2 undoubtedly will be administered to 
subjects who deviate in some way from the subjects 
in the normative sample. It is very likely that this 
;est will be used with individuals who are mentally 
■etarded, learning disabled, deaf or hearing impaired, 
dyslexic, or intellectually gifted, as well as with indi¬ 
viduals who do not speak English or who have suf¬ 
fered a stroke or some other form of cerebral trauma. 
Internal consistency reliability studies with some of 
these special populations are described next and sum¬ 
marized in Table 4.6. Many of these studies use the 
kuder-Richardson Formula #21 (KR-21) to estimate 
internal consistency. This is an easy formula to com¬ 
pute, but it sacrifices some accuracy and it tends to 


underestimate the reliability of tests comprised of 
items that are not of equal difficulty. 

Reliability with Subjects Who Are Mentally 
Retarded. The TONI-2 was administered to 10 sub¬ 
jects being served in a public school program for the 
educable mentally retarded (EMR). Form A and Form 
B were administered back-to-back. KR-21 coefficients 
were calculated and immediate test-retest with alter¬ 
nate forms correlations also were calculated. The 
effects of age were controlled through a partial cor¬ 
relation procedure. The two KR-21 coefficients are .75 
and .77, respectively, and the immediate alternate 
forms coefficient is .81. 

Reliability with Subjects Who Are Learning Dis¬ 
abled. The TONI-2 was administered to 27 subjects 
diagnosed as learning disabled (LD). Form A and 
Form B were administered back-to-back. KR-21 coef¬ 
ficients were calculated and immediate test-retest 
with alternate forms correlations also were calcu¬ 
lated. The effects of age were controlled through a par¬ 
tial correlation procedure. The two KR-21 coefficients 
are .78 and .82, respectively, and the immediate alter¬ 
nate forms coefficient is .92. 


TABLE 4.6 

« Internal Consistency Reliability of TONI-2 with Special Populations 

(Decimals Omitted) 


| Group 

Study ti* 

N 

Mean Age 
in Years 

KR-21 

A B 

Coefficients 

Alpha 

A B 

Immediate 

Alternate 

Forms 

1 Mentally Retarded 

1 

10 

11.4 

75 77 


81 

Learning Disabled 

1 

27 

10.9 

78 82 


92 

• Dyslexic 

4 

61 

10.4 

71 83 



, Head Injured 

2 

103 

30.8 

89 



j Deaf 

1 

46 

16.1 

90 91 


87 

f Gifted-Secondary 

2 

49 

14.8 


92 


Gifted-Elementary 

2 

65 

6.2 


91 


i Spanish Speaking (Mexico) 

3 

1,652 

13.7 


88 86 

90 

1 Spanish Speaking (Chile) 

6 

118 

18.1 


91 


Bilingual/Fully English Proficient 

5 

48 

9.5 

82 



j Limited English Proficient 

5 

21 

8.7 

67 



j Non English Proficient 

5 

11 

7.8 

83 




*1. Brown, Sherbenou, & Johnsen (1982) 
‘ 2. Brown, Sherbenou, & Johnsen (1990) 
i 3. Garcia (1988) 

4. Salas (1989) 

• 5. Lindvig (1989) 

] ,6. Siriany (1989) 


38 



Reliability with Subjects Who Are Deaf. The 
TONI-2 was administered to 46 students at the Texas 
School for the Deaf. Form A and Form B were admin¬ 
istered back-to-back. KR-21 coefficients were calcu¬ 
lated and immediate test-retest with alternate forms 
correlations also were calculated. The effects of age 
were controlled through a partial correlation pro¬ 
cedure. The two KR-21 coefficients are .90 and .91, 
respectively, and the immediate alternate forms coef¬ 
ficient is .87. 

Reliability with Subjects Who Are Dyslexic and 
Reading Disabled. Salas (1988) administered the 
TONI-2 to 61 subjects diagnosed as dyslexic, being 
served in remedial reading programs, or referred for 
the evaluation of severe reading problems. KR-21 
coefficients were calculated and the effects of age were 
controlled through a partial correlation procedure. 
The resulting KR-21 coefficients were .72 for Form 
A and .83 for Form B. 

Reliability with Subjects Who Are Intellectually 
Gifted. Form A of the TONI-2 was administered to 
114 subjects identified as intellectually gifted. Forty- 
nine students attended high school in Plano, Texas; 
29 attended first grade in Plano; and 36 were enrolled 
in first grade in the Eanes Independent School Dis¬ 
trict in Austin, Texas. Coefficients Alpha were cal¬ 
culated separately for the older and younger age 
groups. The two coefficients are .92 and .91, respec¬ 
tively, for the older and younger age groups. 

Reliability with Subjects Who Do Not Speak 
English. The TONI-2 was administered to three 
groups of subjects who do not speak English as their 
primary language. This includes studies by Lindvig 
(1989), whose subjects were classified as Fully English 
Proficient, Limited English Proficient, and Non 
English Proficient, and were given Form A; Garcia 
(1988), who tested a large group of Spanish speaking 
subjects in Chihuahua, Mexico, with both forms of the 
test; Siriany (1989), whose subjects were Spanish 
speakers in Valdivia, Chile, also took both forms. The 
results are reported in Table 4.6. 

Reliability with Subjects Who Are Head Injured. 
Form A of the TONI-2 was administered to 103 sub¬ 
jects in Pocatello, Idaho, who had suffered from head 
trauma. These subjects ranged in age from 10 to 83 
years (M = 30.8). The KR-21 coefficient for these sub¬ 
jects was .89, as summarized in Table 4.6. 

Test Validity 

Validity research provides the second empirical 
proof of a test’s standardization (Hammill, 1987). It 


also verifies the authors’ assertions about the ways 
that the test can be used and confirms the value of 
the information that the test generates. Validity 
imbues the examiner with confidence that the test 
lives up to its reputation and its claims for utility. 

The data contained in this section form the 
nucleus of a body of validity research concerning the 
TONI-2. The studies cited here were conducted by us 
and by independent researchers. We hope that this 
core of research will grow over the years through the 
addition of independent confirmatory research and 
research that replicates our own investigations. In the 
meantime, though, we have provided sufficient pre¬ 
liminary evidence to support the validity of the 
TONI-2 for its stated purposes. This is by no means 
generic, all-purpose validity, and examiners who use 
the TONI-2 to satisfy different appraisal goals than 
those set forth in this manual must accept respon¬ 
sibility for establishing the validity of the TONI-2 
for those additional purposes. 

There are three major theoretical categories of 
validity: content validity, criterion related validity, 
and construct validity. Studies of content validity are 
undertaken to determine if the items that make up 
a test are representative examples of the construct 
that the test measures. We have already provided in 
this chapter both logical and empirical evidence of the 
content validity of the TONI-2; that is, we defined 
the construct being measured, we described our pro¬ 
cedures for amassing a pool of items from the lit¬ 
erature and from knowledgeable professionals, and 
we supplied data to document the validity of the items 
comprising the final version of the test. 

Criterion related validity is concerned with the 
relationship between test performance and similar 
external measures. Investigations of this type assess 
whether a test, such as the TONI-2, correlates as 
expected with these external criteria or can be used 
to predict future performance on the criteria. 
Although we do not report any predictive studies 
using the TONI-2, we have hypothesized the presence 
of significant relationships between the TONI-2 and 
a variety of contemporaneous measures of general 
intelligence and aptitude. 

Construct validity focuses on the theoretical fac¬ 
tors that are presumed to underlie test performance 
or that have been incorporated directly into the struc¬ 
ture of the test. Of particular interest are the rela¬ 
tionships observed among a set of variables, such as 
age, intelligence, and specific kinds of achievement. 
Will test performance improve with age or remain 
fairly stable? Will brighter test subjects earn better 
scores? Will individuals who are better readers or who 
have greater command of the English language earn 
higher scores? These are the kinds of questions that 
are addressed by construct validity research. 


39 



For the TONI-2, vve have formulated hypotheses 
regarding (a) the relationship between the test scores 
and developmental factors, intelligence, and academic 
achievement; (b) the interrelationship of the two 
forms of the TONI-2; (c) the ability of the test to dis¬ 
criminate among intellectually gifted, normal, and 
retarded individuals; (d) the patterns of performance 
found among individuals with dyslexia, individuals 
with learning disabilities, and individuals who do not 
speak English; (e) the multiple correlation and regres¬ 
sion patterns associated with the test; and (0 the 
factor structures that are inherent in the test itself. 

The three categories of content validity, criterion 
related validity, and construct validity are by no 
means discrete. In fact, there is considerable dis¬ 
agreement among psychometrists about the kinds of 
research that in practice would be assigned to each 
of the categories. “Authorities agree more on the 
definitions of the types of validity than they agree on 
which analyses and research designs relate to each 
type of validity” (Hammill, Brown, & Bryant, 1989, 
p. 12). Anastasi (1988), in her review of the various 
categories of validity research, concludes that all 
validity studies are actually designed to assess the 
degree to which a test measures particular attributes, 
traits, or constructs and therefore can be subsumed 
under the superordinate category of construct valid¬ 
ity. For this reason, we have elected not to use the 
customary triptych of content validity, criterion 
related validity, and construct validity as the frame¬ 
work for this discussion of the TONI-2's validity. In¬ 
stead, we will proceed very simply by identifying the 
relevant concepts, hypothesizing anticipated findings, 
subjecting the hypotheses to empirical investigation, 
and reporting the results. 

We should first state, though, that the vast major¬ 
ity of our validity data will be reported as correlation 
coefficients. How large must these coefficients be? It 
will not surprise readers to learn once again that the 
various authorities have several opinions on this mat¬ 
ter! Anastasi (1988), for instance, has concluded that 
coefficients as low as .20 are sometimes acceptable. 
Garrett (1954) is more rigorous; he wants coefficients 
to reach the .40s, at least. Hammill, Brown, and 
Bryant (1989) review the major sources and conclude 
that statistically significant coefficients of .35 and 
higher can be accepted as support for a test’s validity. 
We will adopt their guidelines for interpreting the 
TONI-2 validity data. 

One further note is in order before beginning a 
discussion of the various validity studies cited in this 
section. Some of these studies used the original 1982 
TONI and other used the revised 1990 TONI-2. Items 
1-50 are identical on both versions of the test. The 
TONI-2 differs in content only by the addition of 
Items 51-55. The test items are arranged in order of 


difficulty, which means that these five new items are 
more difficult than the first 50 items. Since ceiling 
effects do not appear to have occurred with any of the 
research using the first edition of the TONI as a 
criterion measure, one can be confident that the same 
results would have been achieved had the TONI-2 
been administered. For convenience, the acronym 
TONI-2 is used throughout this section; readers can 
tell by the date of the research which edition of the 
TONI was used. 

Specifically, the ensuing discussion of validity 
includes investigations that demonstrate (a) the rela¬ 
tionship of the test to chronological age, (b) the rela¬ 
tionship of the test to measures of achievement, (c) 
the relationship of the test to measures of aptitude 
and general intelligence, (d) the multiple correlation 
and regression patterns associated with the test, (e) 
the factor structure of the test, (f) the ability of the 
test to discriminate among groups of subjects known 
to vary in certain qualities, (g) the relationship of the 
test items to the two forms, and (h) the interrelation¬ 
ship of the two forms of the test. 

Correlation with Chronological Age 

Performance on the TONI-2 should be strongly 
related to age. In the normal developmental process, 
the human organism becomes more proficient at prob¬ 
lem solving, and that should be reflected in higher 
raw scores for older subjects. 

An examination of the normative tables in the 
Appendix of this manual reveals that the mean raw 
scores for both forms of the TONI-2 increase steadily 
with age until the late teens. At that point, they 
plateau and then begin to decrease slightly after the 
mid to late 20s. This follows the developmental pat¬ 
tern of intelligence reported by other test construc¬ 
tors and researchers, notably by Wechsler (1958) and 
Spearman (1923). 

As an additional test of this hypothesis, we cor¬ 
related age in months with the raw scores for Form 
A and Form B. The resulting coefficients were .38 and 
.37, respectively, large enough to confirm the hypoth¬ 
esis. These coefficients are no doubt depressed some¬ 
what by the phenomenon of falling scores through the 
adult years. 

Correlation with Measures 
of Achievement 

We would expect a positive, but low to moderate 
relationship between the TONI-2 scores and meas¬ 
ures of academic achievement. Sattler (1988) reviewed 
this body of literature and concluded that intelligence 


40 



test scores are good predictors of the achievement of 
formally taught skills. This type of achievement 
usually is measured by tests of academic achieve¬ 
ment, grades earned in school, and other direct 
measures. The TONI-2, however, is not a traditional 
paper-and-pencil test of aptitude and one would 
therefore anticipate that the correlation coefficients 
associated with the TONI-2 would be somewhat 
lower than those associated with more traditional 
types of tests. 

To test this hypothesis, the TONI-2 Form A and 
Form B quotients were correlated with standard 
scores from several measures of achievement: the 
Diagnostic Achievement Battery (DAB) (Newcomer & 
Curtis, 1984); the Diagnostic Achievement Test for 
Adolescents (DATA) (Newcomer & Bryant, 1991); the 
Diagnostic Achievement Test for Adolescents, Second 
Edition (DATA-2) (Newcomer & Bryant, in press); the 
Formal Reading Inventory (FRI) (Wiederholt, 1986); 
the Iowa Tests of Basic Skills (ITBS) (Hieronymus & 
Hoover, 1985); the SRA Achievement Series (SRA) 
(Naslund, Thorpe & Lefever, 1978); the Stanford 
Achievement Tests, (SAT) (Madden & Gardner, 1972); 
the Scholastic Abilities Test for Adults (SATA) 
(Bryant, Patton & Dunn, in press); the Test of Reading 
Comprehension, Revised (TORC-R) (Brown, Hammill 
& Wiederholt, 1986); the Test of Written Language, 
Second Edition (TOWL-2) (Hammill & Larsen, 1988); 
the Wide Range Achievement Test (WRAT) (Jastak & 
Jastak, 1978); and the Wide Range Achievement Test, 
Revised (WBAT-R) (Jastak & Jastak, 1984). The use 
of standard scores served as a controhfor age. All of 
the correlations were corrected for attenuation in the 
criterion variable only. 

The results of eight studies correlating the 
TONI-2 quotients to these measures of achievement 
are reported in Table 4.7. The correlations are 
organized into the categories of overall achievement, 
reading and language arts, mathematics, writing, 
spelling, and other (i.e., science, social studies, and 
reference skills). Mean coefficients in each of these 
categories were computed using Fisher’s z-transfor- 
mation method. Descriptions of the subject samples 
tested in these eight studies follow. 

In Study HI, Brown, Sherbenou, and Johnsen 
(1982) tested 25 normal subjects ranging in age from 

7.1 to 10.3. years (M = 8.1 years). Their sample 
included 13 males and 12 females of whom 21 were 
white, 3 were black, and 1 was of other racial origins. 

In Study, #2 Brown, Sherbenou, and Johnsen 
(1982) tested 52 normal subjects ranging in age from 

8.1 to 12.6 years (M = 10.8 years). Their sample 
included 27 males and 25 females of whom 39 were 
white and 13 were black. 

In Study #3, Brown, Sherbenou, and Johnsen 
(1982) tested 35 normal subjects ranging in age from 


7.0 to 11.8 years (M = 9.6 years). Their sample 
included 18 males and 17 females of whom 30 were 
white and 5 were black. 

In Study M, Brown, Sherbenou and Johnsen 
(1982) tested 16 learning disabled subjects ranging 
in age from 10.7 to 13.5 years (M = 12.1 years). Their 
sample included 12 males and 4 females of whom 10 
were white and 6 were black. 

In Study #5, Brown, Sherbenou, and Johnsen 
(1982) tested 16 deaf subjects ranging in age from 16.1 
through 18.8 years (M = 17.9 years). This sample 
included 12 males and 4 females of whom 14 were 
white and 1 was black. 

In Study HQ, Bryant and Pearson (1990) tested 94 
normal subjects ranging in age from 6.0 through 53.0 
years (M = 23.9 years). This sample included 39 males 
and 55 females, of whom 91 were white and 3 were 
of other racial origins. 

In Study #7, we tested 40 learning disabled 
students in the Dallas-Fort Worth Metroplex. The 
group was predominantly Caucasian. 

In Study HQ, Hadad (1986) tested 66 learning 
disabled subjects with a mean age of 9.4 years (SD 
= 1.8). His sample included 51 males and 15 females 
of whom 41 were white and 25 were black. 

In Study H 9, Salas (1989) tested 61 dyslexic sub¬ 
jects with a mean age of 10.4 years. Her sample 
included 48 males and 13 females of whom 53 were 
white and 8 were black. 

The coefficients generally exceed the .35 criterion 
and support the hypothesis. The mean coefficients are 
.62 (A) and .85 (B) for overall achievement, .60 (A) and 
.65 (B) for reading and language arts, .54 (A) and .71 
(B) for mathematics, .44 (A) for writing, .33 (A) for 
spelling, and .45 (A) for other types of achievement. 
These fall into the hypothesized low to moderate 
range. The importance of these findings is reinforced 
by the wide age range of test subjects and by the 
similarity of findings with populations of subjects who 
are normal, learning disabled, dyslexic, and deaf. 

Correlation with Measures of Aptitude 
and General Intelligence 

We would expect a strong, positive, relationship 
between the TONI-2 scores and measures of aptitude 
and general intelligence. Even though all of these 
instruments measure a different aspect of intelligent 
behavior, they are all tapping the same overall con¬ 
struct. Measures that are heavily language based 
probably will yield correlations in the low moderate 
range, while those that are comprehensive or nonver¬ 
bal measures will yield higher coefficients. 

To test this hypothesis, the TONI-2 Form A and 
Form B quotients were correlated with standard 


41 



TABLE 4.7 

Correlation of the TONI-2 with Various Measures of Achievement 

(Decimals Omitted) 


Criterion 

Study 

Overall 

Reading & 





Measure 

#* 

Achievement 

Language Arts 

Mathematics 

Writing 

Spelling 

Other 



A B 

A B 

A B 

A B 

A B 

A B 

DAB—Total 








Achievement 

8 







D ATA—0 veral 1 








Achievement 

8 







DATA-2 








Overall 








Achievement 

6 

.57 






Reading 

6 


.52 





Mathematics 

6 



.46 




Writing 

Reading 

6 




.40 



Comprehension 

Math 

6 


.48 





Computation 

6 



.48 




Math Reasoning 

6 



.41 




Science 

6 






.45 

Social Studies 
Reference 

6 






.33 

Skills 

Word 

6 






.56 

Identification 

6 


.51 





Spelling 

Writing 

6 





.36 


Composition 

6 




.33 



FRI- 








Silent Reading 

8 

** 






ITBS 








Language 

1 


80 87 






4 


77 B5 





Reading 








Comprehension 

1 


76 89 






2 


54 56 






4 


63 76 





Mathematics 

1 



68 78 





2 



65 64 





4 



64 54 




SRA 








Composite 

3 

75 85 






Reading 

3 


70 80 





Mathematics 

3 



75 80 




Language Arts 

3 


74 78 





SAT 








Reading 

Comprehension 

Mathematics 

5 


80 68 





Concepts 

5 



81 72 




SATA 








Total 








Achievement 

6 

.56 






Mathematics 

6 



.36 





42 




TABLE 4.7. Continued. 


Criterion 

Study 

Overall 

Reading & 





Measure 

#* 

Achievement 

Language Arts 

Mathematics 

Writing 

Spelling 

Other 



A B 

A B 

A B 

A B 

A B 

A B 

SATA (Continued) 








Reading 

6 


44 





Writing 

Achievement 

6 




67 



Screener 

6 

.60 






Math Calculation 
Reading 

6 



39 




Vocabulary 

6 


64 





Math Application 
Writing 

6 



33 




Mechanics 

Writing 

6 




58 



Composition 

Reading 

6 




54 



Comprehension 

6 


26 





TORC-R 








General 

Vocabulary 

Syntactic 

7 


41 





Similarities 

Paragraph 

7 


33 





Reading 

Sentence 

7 


36 





Sequencing 

Reading 

7 


40 





Comprehension 

7 


46 






9 


27 





TO WL-2—Overall 








Written 

Language 

9 




14 



WRAT 








Reading 

8 


33 





Spelling 

8 





33 


Arithmetic 

8 



38 




WRAT-R 








Reading 

6 


28 





Spelling 

6 





30 


Arithmetic 

6 



34 




Mean Correlations 

All Studies 

.62 .85 

.60 .65 

.54 .71 

.44 

.33 

.45 


*1. Brown, Sherbenou, & Johnsen (1982)—normal sample 

2. Brown, Sherbenou, & Johnsen (1982)—normal sample 

3. Brown, Sherbenou, & Johnsen (1982)—normal sample 

4. Brown, Sherbenou, & Johnsen (1982)—LD sample 

5. Brown, Sherbenou, & Johnsen (1982)—deaf sample 

6. Bryant & Pearson (1990)—normal sample 

V. Brown, Sherbenou, & Johnson (1990)—LD sample 

8. Hadad (1986)—LD sample 

9. Salas (1989)—dyslexic sample 


43 


scores from several measures of aptitude, general 
.ntelligence, and developmental abilities: the Detroit 
rests of Learning Ability-Adult Edition (DTLA-AE) 
Hammill & Bryant, in press); the Kaufman Assess¬ 
ment Battery for Children ( KABC) (Kaufman & Kauf¬ 
man, 1983); the Letter International Performance 
Scale (LIPS) (Leiter, 1948); The Otis-Lennon Mental 
Ability Test (OLMAT) (Otis & Lennon, 1970); The 
Quick Test (QT) (Ammons & Ammons, 1962); the 
Scholastic Abilities Test for Adults (SATA) (Bryant, 
Patton & Dunn, in press); the Screening Assessment 
for Gifted Elementary Students (SAGES) (Johnsen & 
Corn, 1987); the Slosson Intelligence Test for Children 
jind Adults-Reuised (SIT-R) (Slosson, 1985); the Stan¬ 
dard Progressive Matrices (SPM) (Raven, 1938); the 
Test of Language Development-Intermediate, Second 
Edition (TOLD-2 Intermediate) (Hammill & New- 
fcomer, 1988); the Test of Language Development- 
Primary, Second Edition (TOLD-2 Primary) 
(Newcomer & Hammill, 1988); the Wechsler Adult 
Intelligence Scale-Revised (WAIS-R) (Wechsler, 1981); 
the Wechsler Intelligence Scale for Children-Revised 
'(WISC-R) (Wechsler, 1974); and the Woodcock- 
i Johnson Psycho-Educational Battery (WJPEB) (Wood¬ 
cock, 1977). The use of standard scores served as a 
control for age. All of the correlations were corrected 
for attenuation in the criterion variable only. 

1 The results of 13 studies correlating the TONI-2 
quotients to these measures of achievement are 
’reported in Table 4.8. The correlations are organized 
iinto the categories of overall aptitude, language- 
based aptitude, and nonlanguage aptitude. Mean 
icoefficients in each of these categories were computed 
'using Fisher’s z-transformation method. Descriptions 
of the subject samples tested in these 13 studies 
follow. 

In Study #1, Bond and Kennon (1982) tested 71 
normal subjects ranging in age from 7.0 through 18.9 
years. Their sample included 43 males and 28 
females, of whom 52 were white and 19 were black. 

In Study #2, Brown, Sherbenou, and Johnsen 
(1982) tested 25 normal subjects ranging in age from 
7.1 to 10.3 years (M = 8.1 years). Their sample 
inoluded 13 males and 12 females of whom 21 were 
white, 3 were black, and 1 was of other racial origins. 

In Study 1/3, Brown, Sherbenou, and Johnsen 
(1982) tested 16 deaf subjects ranging in age from 16.1 
to 18.8 years (M = 17.9 years). Their sample included 
12 males and 4 females of whom 15 were white and 

1 was black. 

In Study tt 4, Brown, Sherbenou, and Johnsen 
(1982) tested 30 deaf subjects ranging in age from 14.8 
to 17.2 years (M = 16.1 years). Their sample included 
26 males and 4 females of whom 28 were white and 

2 were black. 

In Study #5, Brown, Sherbenou and Johnsen 
(1982) tested 16 learning disabled subjects ranging 


in age from 10.7 to 13.5 years (M = 12.1 years). Their 
sample included 12 males and 4 females of whom 10 
were white and 6 were black. 

In Study #6, Brown, Sherbenou, and Johnsen 
(1982) tested 16 deaf subjects ranging in age from 16.1 
through 18.8 years (M = 17.9 years). This sample 
included 12 males and 4 females of whom 14 were 
white and 1 was black. 

In Study til'. Brown, Sherbenou & Johnsen (1982) 
tested 10 educable mentally retarded subjects rang¬ 
ing in age from 8.1 through 13.7 years (M = 11.5 
years). This sample included 7 males and 3 females, 
of whom 4 were white and 6 were black. 

In Study #8, Biyant and Pearson (1990) tested 94 
normal subjects ranging in age from 6.0 through 53.0 
years (M = 23.9 years). This sample included 39 males 
and 55 females, of whom 91 were white and 3 were 
of other racial origins. 

In Study ft 9, Brown, Sherbenou, and Johnsen 
(1990) tested 139 gifted subjects ranging in age from 
6.0 through 17.8 years (M = 8.4 years). This sam¬ 
ple included 65 males and 74 females, of whom 134 
were white, 1 was black, and 4 were of other racial 
origins. 

In Study It 10, we tested 40 learning disabled 
students in the Dallas-Fort Worth Metroplex. The 
group was predominantly Caucasian. 

In Study #11, we tested 103 head injury patients 
in Pocatello, Idaho. They ranged in age from 10 to 83 
years (M = 30.81). The sample included 97 white sub¬ 
jects, 2 black subjects, and 3 subjects of other racial 
origins. 

In Study #12, Hadad (1986) tested 66 learning 
disabled subjects with a mean age of 9.4 years (SD 
= 1.8). His sample included 51 males and 15 females, 
of whom 41 were white and 25 were black. 

In Study #13, Nasca (1988) tested 221 gifted sub¬ 
jects attending kindergarten through grade six. 

In Study #14 Salas (1989) tested 61 dyslexic sub¬ 
jects with a mean age of 10.4 years. Her sample 
included 48 males and 13 females of whom 53 were 
white and 8 were black. 

In Study #15, Vance (1988) tested 89 mildly handi¬ 
capped subjects ranging in age from 6.3 through 16.8 
years (M = 11.1 years). His sample included 61 males 
and 28 females, of whom 52 were white and 37 were 
black. 

The vast majority of the coefficients exceed the .35 
criterion and support the hypothesis. They provide 
strong evidence of the validity of the TONI-2, evi¬ 
dence that is reinforced by the similarity of Findings 
across wide age intervals and across populations not 
only of normal subjects, but also of subjects who are 
learning disabled, dyslexic, deaf, mildly handicapped, 
mentally retarded, and intellectually gifted. 



TABLE 4.8 

Correlation of the TONI-2 with Various Measures 
of Aptitude, General Intelligence, and Developmental Abilities 
(Decimals Omitted) 


Criterion Measure 

Study fi* 

Overall 

Measures 

A B 

Language-Based 

Measures 

A B 

Nonlanguage 

Measures 

A B 

DTLA-AE 





General Intelligence 

8 

44 



DATA-2 





Listening 

8 


35 


Speaking 

8 


NS 


Vocabulary 

8 


33 


Synonyms/Antonyms 

8 


NS 


Listening Grammar 

8 


43 


Sentence Imitation 

8 


NS 


KABC 





Mental Processing 

8 

53 



Sequential Processing 

8 

52 



Simultaneous Processing 

8 

46 



Nonverbal Scale 

8 



47 

Hand Movements 

8 



40 

Number Recall 

8 


58 


Word Order 

8 


49 


Gestalt Closure 

8 

NS 



Triangles 

8 



53 

Matrix Analogies 

8 



38 

Spatial Memory 

8 

IX 


51 

Photo Series 

8 



NS 

LIPS—Total 

4 



89 83 

OLMAT—Total 

2 


86 81 


QT—Total 

15 


67 


SATA 





General Aptitude 

8 

60 



Nonverbal Reasoning 

8 



53 

Verbal Reasoning 

8 


43 


Quantitative Reasoning 

8 



64 

SAGES 





Program Related D 

14 

52 




10 

52 54 



Reasoning 

14 


50 

47 

School-Acquired Information 

14 

i 




45 



TABLE 4.8. (Continued) 


Criterion Measure 

Study ft* 

Overall 

Measures 

A B 

Language-Based 

Measures 

A B 

Nonlanguage 

Measures 

A B 

SIT-R—Total 

1 


71 



13-Kdg 


24 



13—1st 


57 



13—2nd 


21 



13—3rd 


• 40 



13—4 th 


15 



13—5 th 


36 



13—6 th 


24 


SPM—Total 

3 



92 92 

TOLD-2 Intermediate 





Sentence Combining 

10 


49 60 


Generals 

10 


36 NS 


Spoken Language 

10 


50 60 



14 


49 


TOLD-2 Primary-Total 

14 


24 


WAIS-R—Dyad Version 





Full Scale 

8 

48 



WISC-R 





Dyad Version Full Scale 

8 

30 



Full Scale 

1 

72 




5 

62 92 




6 

95 * 94 




7 

76 87 




10 

71 75 




12 

44 



1 

15 

66 



1 Performance Scale 

1 



77 


5 



62 87 


6 



70 74 


7 



89 80 


10 



60 60 


12 



44 

, 

15 



63 

Verbal Scale 

1 


61 



5 


69 70 



6 


78 73 



7 


46 63 



12 


34 



15 


53 


WJPEB 





Verbal Cluster 

11 

80 



Reasoning Cluster 

11 

79 



Perceptual Speed Cluster 

11 



72 

Memory Cluster 

11 


80 



46 




TABLE 4.8. (Continued) 




Overall 

Language-B ased 

Nonlanguage 

Criterion Measure 

Study It* 

Measures 

Measures 

Measures 



A 

B 

A 

B 

A B 

Mean Correlations 

All Studies 

.64 

.85 

.50 

.69 

.66 .81 


* 1. Bond & Kennon (1982)—normal sample 

2. Brown, Sherbenou, & Johnsen (1982)—normal sample 

3. Brown, Sherbenou, & Johnsen (1982)—deaf sample 

4. Brown, Sherbenou, & Johnsen (1982)—deaf sample 

5. Brown, Sherbenou, & Johnsen (1982)—LD sample 

6. Brown, Sherbenou, & Johnsen (1982)—LD sample 

7. Brown, Sherbenou, & Johnsen (1982)—EMR sample 

8. Bryant & Pearson (1990)—normal sample 

9. Brown, Sherbenou, & Johnsen (1990)—gifted sample 

10. Brown, Sherbenou, & Johnsen (1990)—LD sample 

11. Brown, Sherbenou, & Johnsen (1990)—head injured sample 

12. Hadad (1986)—LD sample 

13. Nasca (1988)—gifted sample 

14. Salas (1989)—dyslexic sample 

15. Vance (1988)—mildly handicapped sample 


t Multiple Correlation/Regression 

Vance, Hankins and Reynolds (1988) used a multi- 
, pie correlation procedure to determine the utility of 
the TONI-2 and the Quick Test (QT) (Ammons & 
> Ammons, 1962) in predicting the Full Scale IQ (FSIQ) 
from the Wechsler Intelligence Scale for Children- 
Revised (WISC-R) (Wechsler, 1974). Their regression 
j data are reproduced below in Table 4.9. They con¬ 
cluded that the TONI-2 and the QT were equally good 
i predictors of the WISC-R FSIQ. A TONI-2 IQ of 100 
would predict a WISC-R FSIQ of 95.7 within a 68% 
confidence interval. 


i Factor Structure 

Parmar (1988) factor analyzed the results of the 
j TONI-2, using as her subjects 91 normally achiev¬ 
ing students in Lucknow, India. They ranged in age 
5 from 84 to 120 months (M = 95.9 months). Her data 
are reported in Table 4.10. She failed to find any 
1 strong subfactors, but she did report minor differences 
in item loadings based on the performances of Indian 

■ and American subjects. 

J 

■ Discrimination Among Groups 

■} 

If the TONI-2 is a valid instrument, it will dis- 
i criminate—or, in some instances, fail to discrirni- 


nate—between groups of subjects with particular 
attributes. 

For instance, the test should discriminate on the 
basis of intellectual ability. Subjects who are known 
to be mentally retarded should earn lower scores than 
subjects of normal intelligence. The normal subjects 
should, in turn, earn lower scores»than those known 
to be intellectually gifted. Mean scores associated 
with these three types of subjects are reported in 
Table 4.11, which also describes the characteristics 
of the subjects involved in each study. In fact, we find 
that the mentally retarded groups earned mean scores 
in the mid 60s while the gifted subjects earned mean 
scores around 115. Normal performance, of course, 
would be quotients of 100. 

We also would expect subjects with thinking dis¬ 
orders to earn lower than normal TONI-2 quotients. 
Table 4.11 also contains mean scores for subjects who 
are learning disabled, dyslexic, and mildly handi¬ 
capped and for subjects who have been referred for 
psychological services because of poor school perform¬ 
ance. These groups tended to score in the 80s and 90s, 
significantly lower than normal but still higher than 
the mentally retarded group. 

One would not expect that individuals who do not 
speak English would in any way be handicapped on 
the TONI-2 if such variables as intelligence, devel¬ 
opmental maturation, socioeconomic status, etc. are 
held equal by the researcher. Lindvig (1990) studied 
subjects attending school in California whose primary 
language was not English. Parmar (1988), who 


47 




TABLE 4.9 

Regression Equation for WISC-R Full Scale IQ by QT and TONI-2 


WISC-R FSIQ from QT. 

FSIQ = .59 (QTIQ) + 34.6 (SEm 8.9) 

WISC-R FSIQ from TONI. 

FSIQ = .77 (TONI IQ) + 18.3 (SEm 9.0) 

WISC-R FSIQ from QT and TONI. 

FSIQ = .45 (TONI) + 3.6 (QT) + 17.7 (SEm 8.2) 


From Vance, Hankins, & Reynolds (1988). 


studied Indian students, found only seven items on 
which the item difficulty varied significantly for 
Indian and American subjects. Siriany (1989) found 
essentially normal levels of achievement (i.e., mean 
raw scores approximating those reported in the 
TONI-2 normative tables) among his Spanish-speak¬ 
ing subjects in Valdivia, Chile. Garcia (1988) found 
lower mean scores for her Spanish-speaking subjects 
in Chihuahua, Mexico, but the order of item difficulty 
was not significantly different from that reported in 
the TONI-2 manual. Vance, Hankins, and Brown 
(1988) found no differences with regard to either 
gender or ethnicity. 

These data are summarized in Table 4.11 and the 
subjects tested in the various studies have been 
described in detail elsewhere in this chapter. 

Intercorrelation of the Two Forms 

The relationship between components of a test 
provide some indication of the construct validity of 
the measure. For the TONI-2, equivalence of Forms 
A and B is evidence that the two instruments are 
strongly related to each other and that they are 
measuring a similar global construct. 

Earlier, in the “Test Reliability” section of this 
manual, we provided evidence that the two forms of 
the test are, indeed, strongly intercorrelated. Both the 
immediate test-retest correlations reported in Table 
4.5 and the delayed test-retest coefficient of .85 are 
quite high, mostly in the high ,80s and ,90s. These 
data reflect on the validity of the TONI-2, as well as 
its reliability. 

Relationship of Items to the Two Forms 

The relationship between individual test items 
and the total score of a test reflects the degree to 
which each item taps specific constructs, abilities, or 
attributes. The TONI-2 item discrimination coeffi¬ 


cients reported earlier in Table 4.1 are statistically 
significant and of such a magnitude that one can con¬ 
clude that the items assigned to each form are meas¬ 
uring the same discrete constructs. 

Further evidence is found in Table 4.12, which 
reports the point-biserial item-to-total correlations of 
Form A items with the Form B total score and of Form 
B items with the Form A total score based on the per¬ 
formance of the 400 item analysis subjects. The 
medians and ranges of these correlations are very 
similar to those reported in Table 4.1 between each 
form’s items and its own total score. One can conclude 
not only that the two forms of the test are equivalent, 
but also that all 110 items of the TONI-2 are meas¬ 
uring the same construct. 


Summary 

The TONI-2 is a highly standardized, norm ref¬ 
erenced measure of abstract/figural problem solving. 
It is intended for use with subjects ages 5-0 through 
85-11 years. The two forms of the test were built 
through both logical and empirical means and exten¬ 
sive evidence of their reliability and validity are 
provided. 

The 110 items, 55 on each form, were constructed 
following an exhaustive review of the relevant lit¬ 
erature. Preliminary items then were subjected to the 
scrutiny of a professional review panel. Items for the 
final version of the TONI-2 were selected on the basis 
of empirical characteristics, specifically their dis¬ 
criminating power and difficulty. 

The reliability of the TONI-2 appears to be excel¬ 
lent. Empirical evidence is offered to demonstrate 
that the test items are homogeneous and internally 
consistent and that the test scores are stable over 
time. Reliability also is high with populations of men¬ 
tally retarded, deaf, learning disabled, dyslexic, 
gifted, and nonEnglish speaking subjects. 


48 




TABLE 4.10 

Rotated Factor Pattern Matrix Based on the TONI-2 Performance 
of Normally Achieving Indian Subjects 



Factor 

Item 

1 

2 

3 

4 

5 

6 

7 

T1 

0.000 

0.000 

0.000 

0.000 

0.000 

0.000 

0.000 

T2 

0.001 

0.077 

0.032 

0.721 

0.004 

0.263 

0.026 

T3 

0.003 

0.030 

-0.005 

0.546 

0.636 

0.099 

-0.052 

T4 

-0.004 

0.066 

0.059 

0.627 

0.189 

-0.108 

-0.066 

T5 

0.019 

0.201 

-0.034 

0.757 

0.071 

0.263 

0.147 

T6 

0.012 

0.146 

0.055 

0.343 

-0.011 

0.681 

0.011 

T7 

0.011 

0.059 

0.049 

0.205 

0.727 

0.307 

-0.051 

T8 

0.020 

0.231 

0.066 

-0.013 

0.737 

-0.049 

-0.157 

T9 

0.030 

0.464 

0.012 

0.554 

-0.029 

-0.010 

0.063 

T10 

0.031 

0.243 

-0.032 

0.052 

0.277 

0.792 

-0.064 

Til 

0.031 

0.485 

0.057 

0.279 

0.196 

0.030 

-0.062 

T12 

0.030 

0.478 

0.173 

0.240 

0.030 

0.228 

-0.157 

T13 

0.044 

0.623 

0.193 

0.146 

0.091 

-0.062 

-0.134 

T14 

0.051 

0.465 

0.236 

0.114 

0.067 

0.252 

0.052 

T15 

0.041 

0.585 

0.303 

0.052 

0.032 

0.248 

-0.342 

T16 

-0.098 

0.604 

0.128 

0.044 

0.064 

0.237 

0.105 

T17 

0.099 

0.572 

0.346 

0.033 

0.113 

-0.016 

0.116 

T18 

0.086 

0.468 

0.588 

-0.041 

0.099 

0.072 

-0.208 

T19 

-0.035 

0.723 

-0.142 

-0.018 

-0.059 

-0.026 

0.044 

T20 

0.127 

0.232 

0.703 

0.005 

-0.009 

0.144 

-0.194 

T21 

0.209 

0.243 

0.690 

0.041 

0.065 

-0.047 

0.253 

T22 

-0.050 

0.368 

0.174 

-0.126 

0.153 

0.051 

0.715 

T23 

0.326 

0.154 

0.603 

0.085 

-0.057 

-0.102 

-0.197 

T24 

0.724 

0.025 

0.252 

0.019 

0.014 

0.007 

0.026 

* T25 

0.375 

0.134 

0.648 

0.029 

0.038 

-0.084 

0.292 

T26 

-0.080 

-0.044 

0.590 

0.003 

0.055 

0.091 

0.379 

T27 

0.995 

0.015 

0.072 

0.006 

0.005 

0.007 

0.005 

T28 

0.995 

0.015 

0.072 

0.006 

0.005 

0.007 

0.005 

T29 

- 0.000 

0.000 

- 0.000 

- 0.000 

0.000 

0.000 

- 0.000 

T30 

0.995 

0.015 

0.072 

0.006 

0.005 

0.007 

0.005 

T31 

0.995 

0.015 

0.072 

0.006 

0.005 

0.007 

0.005 

T32 

0.995 

0.015 

0.072 

0.006 

0.005 

0.007 

0.005 

T33 

0.995 

0.015 

0.072 

0.006 

0.005 

0.007 

0.005 

T34 

0.995 

0.015 

0.072 

0.006 

0.005 

0.007 

0.005 

T35 

- 0.000 

0.000 

0.000 

-0.000 

0.000 

0.000 

- 0.000 

T36 

-0.000 

0.000 

- 0.000 

- 0.000 

0.000 

0.000 

- 0.000 

T37 

0.995 

0.015 

0.072 

0.006 

0.005 

0.007 

0.005 

T38 

0.995 

0.015 

0.072 

0.006 

0.005 

0.007 

0.005 

T39 

0.000 

0.000 

0.000 

0.000 

0.000 

0.000 

0.000 

T40 

1 

1 

1 

| 

1 

| 

1 

T50 

0.000 

0.000 

0.000 

0.000 

0.000 

0.000 

0.000 


From Parmar (1988). 


49 




TABLE 4.11 

Mean TONI-2 Scores for Various Groups 




Mean 

Quotient 

Group 

Study 

A 

B 

Mentally Retarded 

1 

66 

68 

Learning Disabled 

3 

102 


Dyslexic 

5 

93 

99 

Head Injured 

2 

90 


Gifted-Secondary 

2 

119 


Gifted-Elementary 

2 

113 


Spanish Speaking (Mexico) 

3 

89 

87 

Spanish Speaking (Chile) 

7 

93 

98 

Indian Speaking-Normal 

4 

82 


Indian Speaking-Mentally Retarded 

4 

60 


Bilingual/Fully English Proficient 

6 

93 


Limited English Proficient 

6 

92 


Non English Proficient 

6 

81 


Black 

8 

82 


*1. Brown, Sherbenou, & Johnsen (1982) 

2. Brown, Sherbenou, & Johnsen (1990) 

3. Garcia (1988) 

4. Parmar (1988) 

5. Salas (1989) 

6. Lindvig (1989) 

7. Siriany (1989) 

8. Vance, Hankins, & Brown (1988) 




TABLE 4.12 

Medians and Ranges of Item-Total Correlations for Forms A and B Total Scores 

with the Alternate Form’s Items 
(Decimals Omitted) 



Age Intervals 


6-7 years 
(N = 100) 

Median Range 

13-14 
(N = 

Median 

years 

100) 

Range 

35-40 years 
(N = 100) 

Median Range 

60-65 
(N = 

Median 

years 

100) 

Range 

TOTAL 
(N «= 400) 

Median Range 

Form A Items/ 
Form B Total 

41 18-64 

19 

00-69 

39 

04-76 

39 

05-77 

60 

10-80 

Form B Items/ 
Form A Total 

43 04-73 

21 

00-71 

47 

05-79 

46 

02-70 

58 

10-80 


Substantial preliminary evidence of the validity 
of the TONI-2 has been provided. Data demonstrate 
that (a) the test items are representative, (b) the test 
scores are strongly related to other measures of apti¬ 
tude and general intelligence, (c) the test scores are 
related as hypothesized to chronological age and to 
measures of achievement, (d) the two forms of the test 


are strongly intercorrelated, (e) the test accurately dis¬ 
criminates among groups of subjects who vary accord¬ 
ing to specified attributes, (f) hypothesized multiple 
correlation and regression results can be obtained, 
and (g) the factor structures underlying the test are 
those that were hypothesized. 


50 






References 


Adams, G. S, (1976). Measurement and evaluation in educa¬ 
tion, psychology and guidance. New York: Holt, Rinehart 
& Winston. 

Aiken, L. R. (1985). Psychological testing and assessment. 
Boston: Allyn & Bacon. 

American Psychological Association. (4981). Ethical prin¬ 
ciples ofpsychologists. Washington, DC: American Psy¬ 
chological Association. 

American Psychological Association. (1985). Standards for 
educational and psychological tests. Washington, DC: 
American Psychological Association. 

Ammons, R. B., & Ammons, C. H. (1962). The Quick Test: 
Provisional manual. Psychological Reports, 11, 11-161. 

Anastasi, A. (1988). Psychological testing (6th ed.). New 
York: Macmillan. 

Arter, J. A., & Jenkins, J. R. (1979). Differential diagnosis 
—prescriptive teaching; A critical appraisal. Review of 
Educational Research, 49, 512-555. 

Binet, A., & Simon, T. (1905). The development of intelli¬ 
gence in children. L'Annee Psychologique, 11, 191-244. 
(E. S. Kit, trans., Baltimore, MD: Williams & Wilkins, 
1916). 

Bloom, A. S., & Raskin, L. M. (1980). A comparison of 
WISC-R and Stanford Binet Intelligence Scale classifica¬ 
tions of developmentally disabled children. Psychology 
in the Schools, 36, 322-323. 

Bloom, A. S., Wagner, M., & Bergman, A. (1980). A com¬ 
parison of intellectually delayed and primary reading 
disabled children on measures of intelligence and 
achievement. Journal of Clinical Psychology, 36, 
788-790. 


Bond, C. L., & Kennon, R. W. (1982). TONI: A language free 
substitute for SIT or WISC-R? Unpublished manuscript, 
Memphis State University, Department of Curriculum 
and Instruction. 

Bourne, L. E. (1963). Some factors affecting strategies used 
in problems of concept formation. American Journal of 
Psychology, 75, 229-238. 

Bourne, L. E. (1967). Learning and utilization of conceptual 
rules. In B. Kleinmutz (Ed.), Memory and the structure 
of concepts. New York: Wiley. 

Bourne, L. E., & Guy, D. E. (1968a). Learning conceptual 
rules I: Some inter-rule transfer effects. Journal of 
Experimental Psychology, 76, 423—429. 

Bourne, L. E., & Guy, D. E. (1968b). Learning conceptual 
rules II: The role of positive and negative instances. Jour¬ 
nal of Experimental Psychology, 77, 488-494. 

Brown, L., & Bryant, B. R. (1984). The why and how of 
special norms. Remedial and Special Education, 5, 
52-61. 

Brown, L., Sherbenou, R. J., & Johnsen, S. (1982). Test of 
Nonverbal Intelligence. Austin, TX: PRO-ED. 

Brown, V. L., Hammill, D. D., & Wiederholt, J. L. (1978). 
Test of Reading Comprehension. Austin, TX: PRO-ED. 

Brown, V. L., Hammill, D. D., & Wiederholt, J. L. (1986). 
Test of Reading Comprehension—Revised. Austin, TX: 
PRO-ED. 

Bryant, B. R., Patton, J., & Dunn, C. (in press). Scholastic 
Abilities Test for Adults. Austin, TX: PRO-ED. 

Bryant, B. R. & Pearson, N. (1990). Intercorelations among 
aptitude and achievement measures. Unpublished 
manuscript, Austin, TX. 


51 



Cronbach, L. J. (1951). Coefficient Alpha and the internal 
structure of tests. Psychometrika, 16, 297-334. 

Cronbach, L. J., & Snow, R. E. (1977). Aptitudes and instruc¬ 
tional methods: A handbook for research on interactions. 
New York: Irvington. 

Dominowski, R. L. (1966). Problem solving as a function of 
relative frequency of correct responses. Psychonomic 
Science, 1, 107-111. 

DufTey, J. B., Salvia, J., Tucker, J., & Ysseldyke, J. (1981). 
Nonbiased assessment: A need for operationalism. 
Exceptional Children, 41 ’, 427-434. 

Dunn, L. M., & Dunn, L. (1981). Peabody Picture Vocabulary 
Test—Revised. Circle Pines, MN: American Guidance 
Service. 

Gagne, R. M. (1959). Problem solving and thinking. Annual 
Review of Psychology, 10, 147-172. 

Galton, F. (1869). Hereditary genius: An inquiry into its laws 
and consequences. London: Macmillan. 

Garcia, O. A. (1988). Investigation del test no verbal inteligen- 
cia (TONI) en la ciudad de Chihuahua. Unpublished 
thesis, La Universidad Autonoma de Chihuahua, 
Chihuahua, Mexico. 

Gardner, H. (1983). Frames of mind. New York: Basic 
Books. 

Garrett, H. E. (1954). Statistics in psychology and education. 
New York: Longmans, Green. 

Glucksberg, S. (1964). Functional fixedness: Problem solu¬ 
tion as a function of observing responses. Psychonomic 
Science, 1, 117-118. 

Guilford, J. P. (1956a). The structure of intellect. Psycho¬ 
logical Bulletin, 53, 267-293. 

Guilford, J. P. (1956b). Fundamental statistics in psychology 
and education (3rd ed.). New York: McGraw-Hill. 

Guilford, J. P. (1967). The nature of human intelligence. New 
York: McGraw-Hill. 

Haddad, F. A. (1986). Concurrent validity of the Test of 
Nonverbal Intelligence with learning disabled children. 
Psychology in the Schools, 23, 361-364. 

Hammill, D. D. (1987). An overview of assessment practices. 
In D. D. Hammill (Ed.), Assessing the abilities and 
instructional needs of students (p. 5-37). Austin, TX: 
PRO-ED. 

Plammill, D. D., Brown, L., & Bryant, B. R. (1989). A con¬ 
sumer’s guide to tests in print. Austin, TX: PRO-ED. 

Hammill, D. D., & Bryant, R. R. (in press). Detroit Tests of 
Learning Aptitude—Adult Edition. Austin, TX: PRO-ED. 

Hammill, D. D., & Bryant, B. R. (1986). Detroit Tests of 
Learning Aptitude—Primary. Austin, TX: PRO-ED. 

Hammill, D. D., & Larsen, S. C. (1988). Test of Written 
Language, Second Edition. Austin, TX: PRO-ED. 

Hammill, D. D., & Newcomer, P. L. (1988). Test of Language 
Development—Intermediate, Second Edition. Austin, TX: 
PRO-ED. 

Helmstadter, B. C. (1964). Principles of psychological meas¬ 
urement. New York: Appleton-Century-Crofts. 

Hieronymous, A. N., & Hoover, H. (1985). Iowa Tests of 
Basic Skills. Chicago: Riverside. 


Hollis, J. W., & Donn, P. A. (1979). Psychological report 
writing: Theory and practice (2nd ed.). Muncie, IN: 
Accelerated Development. 

Jastak, J. F., & Jastak, S. R. (1978). Wide Range Achieve¬ 
ment Test. Wilmington, DE: Jastak Assessment Systems. 

Jastak, J., & Wilkinson, G. S. (1984). Wide Range Achieve¬ 
ment Test—Revised. Wilmington, DE: Jastak Assessment 
Systems. 

Jensen, A. (1980). Bias in mental testing. New York: Free 
Press. 

Johnsen, S. K., & Corn, A. L. (1987). Screening Assessment 
for Gifted Elementary Students. Austin, TX: PRO-ED. 

Kaufman, A. S. (1979). Intelligent testing with the WISC-R. 
New York: Wiley. 

Kaufman, A. S., & Kaufinan, N. L. (1983). Kaufman Assess¬ 
ment Battery for Children. Circle Pines, MN: American 
Guidance Service. 

Kelley, T. L. (1927). Interpretation of educational measure¬ 
ment. Yonkers-on-Hudson, NY: World Press. 

Knoff, H. M. (1983). Personality assessment in the schools: 
Issues and procedures for school psychologists. School 
Psychology Review, 12, 391-398. 

Leiter, R. G. (1948). Leiter International Performance Scale. 
Chicago: Stoelting. 

Lindvig, E. (1990). TONI scores of students with differ¬ 
ent linguistic backgrounds. Unpublished manuscript, 
Bakersfield, CA. 

Lloyd, J. W. (1984). How shall we individualize instruction 
—Or should we? Remedial and Special Education, 5, 
7-15. 

Madden, R., & Gardner, E. F. (1972). Stanford Achievement 
Tests. New York: Harcourt Brace Jovanovich. 

Mercer, J. R. (1976). Pluralistic diagnosis in the evaluation 
of black and chicano children: A procedure for taking 
sociocultural variables into account in clinical assess¬ 
ment. In C. A. Hernandez, M. J. Haug, & N. N. Wagner 
(Eds.), Chicanos: Social and psychological perspectives. 
St. Louis: Mosby. 

Myers, P. I., & Hammill, D. D. (1990). Learning disabilities. 
Austin, TX: PRO-ED. 

Nasca, D. (1988). The use of nonverbal measures of intel¬ 
lectual functioning in identifying gifted children. Unpub¬ 
lished manuscript, State University of New York College 
at Brockport, Department of Educational Adminis¬ 
tration. 

Naslund, R. A., Thorpe, L. P., & Lefever, D. W. (1978). SRA 
Achievement Series. Chicago: Science Research Asso¬ 
ciates. 

Newcomer, P. L., & Curtis, D. (1984). Diagnostic Achieve¬ 
ment Battery. Austin, TX: PRO-ED. 

Newcomer, P. L., & Bryant, B. R. (1986). Diagnostic Achieve¬ 
ment Test for Adolescents. Austin, TX: PRO-ED. 

Newcomer, P. L., & Bryant, B. R. (in press). Diagnostic 
Achievement Test for Adolescents-Second Edition. 
Austin, TX: PRO-ED. 

Newcomer, P. L., & Hammill, D. D. (1988). Test of Language 
Development—Primary, Second Edition. Austin, TX: 
PRO-ED. 

Nunnally, J. S. (1978). Psychometric theory (2nd ed.). New 
York: McGraw-Hill. 


52 



Oosterhof, A. C. (1976). Similarity of various item discrim¬ 
ination indices. Journal of Educational Measurement , 13, 

145-150. 

Otis, A. S., & Lennon, R. T. (1970). The Otis-Lennon Men¬ 
tal Ability Test. New York: Harcourt Brace Jovanovich. 

Parmar, R. S, (1988). Cross-cultural validity of the Test of 
Nonverbal Intelligence. Unpublished doctoral disserta¬ 
tion, University of North Texas, Denton. 

Rafoth, M. A., & Richmond, B. (1983). Useful terms in psy- 
choeducational reports: A survey of students, teachers, 
and psychologists. Psychology in the Schools, 20, 
346-360. 

Raven, J. C. (1938). Standard Progressive Matrices: A Per¬ 
ceptual Test of Intelligence. London: H. K. Lewis 

Raven, J. C. (1947). Coloured Progressive Matrices. London: 
H. K. Lewis. 

Raven, J. C. (1960). Guide to Standard Fhogressive Matrices. 
London: H. K. Lewis. 

Resnick, L. B., & Glaser, R. (1976). Problem solving and 
intelligence. In L. B. Resnick (Ed.), The nature of intel¬ 
ligence. Hillsdale, NJ: Erlbaum. (p. 205-230) 

Roscoe, J. T. (1969). Fundamental research statistics for the 
behavioral sciences. New York: Holt, Rinehart & 
Winston. 

Salas, B. A. (1988). The characteristics of dyslexic students 
in public schools: An operational definition. Unpublished 
doctoral dissertation, The University of Texas at Austin. 

Salvia, J., & Ysseldyke, J. E. (1988). Assessment in special 
and remedial education (4th ed.). Boston: Houghton 
Mifflin. 

Sattler, J. M. (1988). Assessment of children (3rd ed.). San 
Diego: Author. 

Shellenberger, S. (1982). Presentation and interpretation 
of psychological data in educational settings. In C. R. 
Reynolds & T. B. Gutkin (Eds.), The handbook of school 
psychology. New York: Wiley, (pp. 51-81). 

Siriany, A. P. (1989). Developing a slide projection group 
administration procedure for the TONI. Unpublished 
manuscript, Universidad Austral del Chile, Valdivia, 
Chile. 

Slosson, R. L. (1985). Slosson Intelligence Test for Children 
and Adults—Revised. East Aurora, NY: Slosson Educa¬ 
tional Publications. 

Spearman, C. (1914). The measurement of intelligence. 
Eugenics Review, 6, 312-313. 

Spearman, C. (1923). The nature of intelligence and the prin¬ 
ciples of cognition. London: Macmillan. 

Statistical Abstract of the United States. (1985). Washington, 
DC: U.S. Department of Commerce, Bureau of the 
Census. 


Sternberg, R. J. (1980). Factor theories of intelligence are 
all right almost. Educational Researcher, 9, 6-13. 

Sternberg, R. J. (1984). Toward a triarchic theory of human 
intelligence. Behavioral and Brain Sciences, 7, 269-315. 

Svinicki, J. G., & Tombari, M. L. (1981). Developing and 
interpreting local norms: Making test scores work for you. 
Dallas: DLM/Teaching Resources. 

Terman, L. M. (1906). Genius and stupidity: A study of some 
of the intellectual processes of seven ‘bright’ and seven 
‘stupid’ boys. Pedagogical Seminary, 13, 307-373. 

Thurstone, L. L- (1938). Primary mental abilities. Psycho¬ 
metric Monographs No. 1. Chicago: University of 
Chicago Press. 

Vance, B. (1988). Concurrent validity of the Quick Test, the 
Test of Nonverbal Intelligence, and the WISC-R for a 
sample of special education students. Psychological 
Reports, 762, 443^446. 

Vance, B., Hankins, N., & Brown, W. (1988). Ethnic and 
sex differences on the Test of Nonverbal Intelligence, the 
Quick Test, and the Wechsler Intelligence Scale for 
Children-Revised. Journal of Clinical Psychology, 44, 
261-265. 

Vance, B., Hankins, N., & Reynolds, F. (1988). Prediction 
of Wechsler Intelligence Scale for Children-Revised Full 
Scale IQ from the Quick Test of Intelligence and the Test 
of Nonverbal Intelligence for a sample of referred chil¬ 
dren and youth. Journal of Clinical Psychology, 44, 
793-794. 

Walters, J. M., & Gardner, H. (1986). The theory of multi¬ 
ple intelligences: Some issues and answers. In R. J. 
Sternberg & R. K. Wagner (Eds.), Practical intelligence. 
Cambridge: Cambridge University Press. 

Wechsler, D. (1958). The measurement and appraisal of 
adult Intelligence (4th ed.). Baltimore: Williams & 
Wilkins. 

Wechsler, D. (1981). Wechsler Adult Intelligence Scale- 
Revised. San Antonio: The Psychological Corporation. 

Wechsler, D. (1974). Wechsler Intelligence Scale for Chil¬ 
dren—Revised. San Antonio: The Psychological Cor¬ 
poration. 

Wechsler, D. (1967). Wechsler Preschool and Primary Scale 
of Intelligence. San Antonio: The Psychological Corpo¬ 
ration. 

Wiederholt, J. L. (1986). Formal Reading Inventory. Austin, 
TX: PRO-ED. 

Zins, J. E., & Barnett, D. W. (1983). Report writing: Legis¬ 
lative, ethical, and professional challenges. Journal of 
School Psychology, 21, 219-228. 


53 



Appendix 



TABLE A 

Converting Raw Scores to TONI-2 Quotients and Percentile Ranks for Form A 




Age Interval ( 


5-0 to 5-ll' 

a 

6-0 to 6-11 

7-0 to 7-11 

8-0 to 8-11 

fco to 9-11 

10-0 to 10-11^ 

11-0 to 12-^J 


years 

years 

years ^ 

^ years 

years 

years 

years 

Raw Score 

Q 

%ile 

Q 

%i Je 

Q 

%ile 

Q 

%ile 

Q 

%ile 

Q 

%Ue 

Q 

%ile 

0 

69 

2 

61 

.9 

<68 

<.6 

<57 

<.5 







1 

77 

6 

65 

1 

68 

.6 

57 

.5 

<57 

<.6 





2 

82 

12 

69 

2 

61 

.9 

58 

.6 

67 

.5 

<67 

<.5 



3 

88 

21 

76 

6 

65 

1 

63 

1 

61 

.9 

67 

.5 

<57 

<.5 

4 

93 

32 

80 

9 

74 

4 

65 

1 

63 

1 

61 

.9 

57 

.5 

6 

97 

42 

84 

14 

79 

8 

69 

2 

67 

1 

65 

1 

61 

.9 

6 

103 

58 

87 

19 

82 

12 

75 

5 

72 

3 

69 

2 

65 

1 

7 

106 

66 

91 

27 

85 

16 

80 

9 

75 

5 

72 

3 

67 

1 

8 

108 

70 

95 

37 

87 

19 

82 

12 

79 

8 

75 

6 

69 

1 

9 

111 

77 

97 

42 

89 

23 

83 

13 

81 

10 

77 

6 

72 

3 

10 

112 

79 

98 

45 

92 

30 

85 

16 

82 

12 

78 

7 

75 

5 

11 

114 

83 

102 

55 

95 

37 

88 

21 

84 

14 

81 

10 

77 

6 

12 

118 

88 

106 

66 

99 

47 

91 

27 

87 

19 

84 

14 

80 

9 

13 

123 

94 

111 

77 

103 

58 

93 

32 

92 

30 

88 

21 

84 

14 

14 

131 

98 

116 

86 

108 

70 

97 

42 

95 

37 

92 

30 

87 

19 

15 

136 

99 

121 

92 

112 

■ 79 

102 

55 

98 

45 

96 

39 

90 

25 

16 

>136 

>99 

126 

96 

114 

83 

105 

63 

101 

53 

98 

45 

92 

30 

17 



128 

97 

116 

86 

107 

68 

103 

58 

101 

53 

95 

37 

18 



131 

98 

118 

88 

109 

73 

105 

63 

103 

58 

96 

39 

19 



135 

99 

119 

90 

111 

77 

107 

68 

105 

63 

97 

42 

20 



136 

99 

122 

93 

111 

77 

110 

75 

106 

66 

98 

45 

21 



>136 

>99 

126 

96 

112 

79 

111 

77 

108 

70 

99 

47 

22 





128 

97 

113 

81 

112 

79 

108 

70 

101 

53 

23 





131 

98 

115 

84 

114 

83 

109 

73 

103 

58 

24 





136 

99 

116 

86 

115 

84 

111 

77 

104 

61 

25 





>136 

>99 

118 

88 

116 

86 

112 

79 

106 

66 

26 







120 

91 

119 

90 

114 

83 

107 

6B 

27 







122 

93 

121 

92 

115 

84 

109 

73 

28 







126 

'96 

123 

94 

118 

88 

111 

77 

29 







131 

98 

125 

95 

121 

92 

114 

83 

30 







133 

99 

128 

97 

123 

94 

116 

86 

31 







135 

99 

131 

98 

126 

96 

118 

88 

32 







136 

99 

133 

99 

128 

97 

118 

88 

33 







>136 

>99 

134 

99 

128 

97 

120 

91 

34 









135 

99 

131 

98 

121 

92 

35 









136 

99 

131 

98 

122 

93 

36 









140 

99.2 

135 

99 

123 

94 

37 









142 

99.4 

136 

99 

126 

96 

38 









>142 

>99.4 

138 

99 

127 

96 

39 











143 

99.5 

129 

97 

40 











>143 

>99.5 

132 

98 

41 











' 


135 

99 

42 













136 

99 

43 













.138 

99 

44 













144 

99.6 

45 













>144 

>99.6 

46 















47 















48 















49 















50 















51 















52 















53 















54 















55 
















56 



TABLE A. (Continued) 


Age Interval 


12-6 to 14-&J 
years 

Q %ile 

14-6 to 17-11 
years ' 

Q %ile 

1 18-0 to 23-1 !' 
years 

Q %ile 

24-0 to 29-11 , 
years 

Q %ile 

30-0 to 49-11 
years \j 
Q %ile 

60-0 to 85-1 j 
years J 

Q %ile 

Raw Score 





- 

— 

*<r 


so £-< 

<57 

<.E 

0 

“ 




■— 

— 



<57 

<.6 

57 

.5 

1 

— 

— 

- 

— 

— 

— 

<57 

<.5 

67 

.5 

61 

1 

2 







67 

.5 

61 

1 

63 

1 

3 

<57 

<.5 

— 

— 

— 

- 

61 

i 

63 

1 

65 

1 

4 

57 

.6 

<57 

<.5 


<<r 

63 

i 

65 

1 

69 

1 

5 

61 

.9 

57 

.5 

<57 

<.5 

65 

i 

69 

1 

70 

2 

6 

65 

1 

61 

.9 

57 

.5 

69 

i 

70 

2 

75 

5 

7 

67 

1 

65 

1 

61 

1 

72 

3 

75 

5 

77 

6 

8 

69 

1 

67 

1 

65 

1 

75 

5 

78 

7 

79 

8 

9 

72 

3 

69 

1 

67 

1 

76 

5 

79 

8 

80 

9 

10 

74 

4 

71 

3 

69 

1 

77 

6 

80 

9 

81 

10 

11 

77 

6 

74 

4 

72 

3 

77 

6 

81 

10 

82 

12 

12 

81 

10 

78 

7 

74 

4 

78 

7 

82 

12 

83 

13 

13 

85 

16 

83 

13 

75 

5 

78 

7 

84 

14 

86 

18 

14 

88 

21 

86 

18 

77 

6 

79 

• 8 

86 

18 

87 

19 

15 

90 

25 

87 

19 

77 

6 

79 

8 

87 

19 

88 

21 

16 

92 

30 

88 

21 

78 

7 

80 

9 

87 

19 

88 

21 

17 

94 

34 

90 

25 

78 

7 

82 

12 

87 

19 

88 

21 

18 

95 

37 

91 

27 

79 

8 

82 

12 

87 

19 

89 

23 

19 

97 

42 

91 

27 

80 

9 

83 

13 

88 

21 

89 

23 

20 

98 

45 

92 

30 

81 

10 

83 

13 

88 

21 

89 

23 

21 

100 

50 

92 

37 

81 

10 

84 

14 

88 

21 

90 

25 

22 

101 

53 

93 

32 

82 

12 

85 

16 

88 

21 

91 

27 

23 

102 

55 

94 

34 

83 

13 

86 

18 

90 

25 

92 

30 

24 

103 

58 

95 

37 

85 

16 

88 

21 

91 

27 

93 

32 

25 

105 

63 

96 

39 

86 

18 

89 

23 

92 

30 

94 

34 

26 

107 

68 

97 

42 

86 

18 

90 

25 

94 

34 

95 

37 

27 

108 

70 

99 

47 ■ 

87 

19 

92 

30* 

95 

37 

96 

39 

28 

110 

75 

100 

50 

88 

21 

94 

34 

97 

42 

98 

45 

29 

112 

79 

101 

53 

89 

23 

95 

37 

98 

45 

99 

47 

30 

114 

83 

102 

55 

90 

25 

96 

39 

99 

47 

100 

50 

31 

116 

86 

103 

58 

92 

30 

98 

45 

100 

50 

102 

55 

32 

117 

87 

105 

63 

92 

30 

100 

50 

101 

53 

103 

58 

33 

118 

88 

107 

68 

93 

32 

101 

53 

103 

58 

104 

61 

34 

120 

91 

110 

75 

94 

34 

101 

53 

103 

58 

105 

63 

35 

122 

93 

111 

77 

96 

39 

102 

55 

104 

61 

107 

68 

36 

123 

94 

113 

81 

98 

45 

104 

61 • 

106 

66 

109 

73 

37 

125 

95 

114 

83 

99 

47 

106 

66 

108 

70 

111 

77 

38 

126 

96 

116 

86 

101 

53 

108 

70 

no 

75 

113 

81 

39 

130 

98 

118 

88 

103 

58 

111 

77 

112 

79 

115 

84 

40 

132 

98 

120 

91 

105 

63 

112 

79 

114 

83 

117 

87 

41 

135 

99 

123 

94 

108 

70 

114 

83 

117 

87 

119 

90 

42 

136 

99 

126 

96 

110 

75 

118 

88 

120 

91 

122 

93 

43 

138 

99 

131 

98 

114 

83 

120 

91 

122 

93 

125 

95 

44 

144 

96.6 

133 

99 

118 

88 

122 

93 

124 

95 

128 

97 

45 

>144 

>99.6 

135 

99 

122 

93 

124 

95 

126 

96 

136 

99 

46 



139 

99.1 

' 124 

95 

126 

96 

128 

97 

138 

99 

47 



141 

99.3 

126 

96 

128 

97 

130 

98 

140 

99.2 

48 



144 

99.6 

128 

97 

130 

98 

135 

99 

144 

99.6 

49 



>144 

>99.6 

130 

98 

133 

99 

144 

99.6 

>144 

>99.6 

50 




V \ 

133 

99 

135 

99 

>144 

>99.6 

5 

\ 

51 





135 

99 

139 

99.1 


X, 



52 





139 

99.1 

144 

99.6 





53 





' 144 

99.6 

>144 >99.6 





54 





>144 

>99.6 







55 





•T 6 









57 






TABLE B 

Converting Raw Scores to TONI-2 Quotients and Percentile Ranks for Form B 



Age Interval 

/_/_^ 



Y 


\ 

I 

V 


V 


V 

T 

V 

1 

\A 


6-0 to 6-11 ’ 

6-0 to 6-11 

7-0 to 7-11 

8-0 to 8-11 

9-0 to 9-11 

10-0 to 10-11 

11-0 to 12-5 


years 

years 

years 

years 

years 

years 

years 

Raw Score 

Q 

%ile 

Q 

%ile 

Q 

%ile 

Q 

%ile 

Q 

%ile 

Q 

%ile 

Q 

%ile 

0 

70 

2 

69 

2 

60 

.8 

58 

.6 

<68 

<.6 


- 

— 

— 

1 

79 

8 

74 

4 

64 

1 

60 

.8 

68 

.6 

<67 

<.6 

— 

- 

2 

87 

19 

77 

6 

72 

3 

63 

1 

60 

.8 

67 

.6 

<67 

<.5 

3 

89 

23 

81 

10 

77 

6 

65 

1 

63 

1 

60 

.8 

67 

.5 

4 

92 

30 

83 

13 

82 

12 

69 

2 

64 

1 

63 

1 

60 

.8 

6 

96 

39 

86 

18 

84 

14 

74 

4 

69 

2 

68 

2 

65 

1 

6 

100 

60 

88 

21 

87 

19 

77 

6 

74 

4 

73 

4 

67 

1 

7 

102 

65 

91 

27 

90 

25 

78 

7 

75 

5 

74 

4 

69 

2 

8 

104 

61 

94 

34 

91 

27 

79 

8 

77 

6 

76 

6 

72 

3 

9 

108 

70 

97 

42 

93 

32 

82 

12 

80 

9 

77 

6 

73 

4 

10 

1-12 

79 

102 

.65 

95 

37 

84 

14 

83 

13 

79 

8 

74 

4 

11 

115 

84 

104 

61 

97 

42 

87 

19 

86 

18 

81 

10 

75 

5 

12 

117 

87 

107 

68 

100 

50 

90 

25 

87 

19 

82 

12 

78 

7 

13 

118 

88 

111 

77 

102 

55 

92 

30 

89 

23 

84 

14 

81 

10 

14 

120 

91 

113 

81 

104 

61 

95 

37 

91 

27 

87 

19 

82 

12 

15 

125 

95 

116 

86 

107 

68 

97 

42 

94 

34 

90 

25 

83 

13 

16 

128 

97 

120 

91 

110 

75 

99 

47 

96 

39 

92 

30 

84 

14 

17 

131 

98 

122 

93 

112 

79 

101 

53 

98 

45 

95 

37 

87 

19 

18 

136 

99 

123 

94 1 

114 

83 

102 

55 

100 

60 

97 

42 

89 

23 

19 

>136 

>99 

124 

95 

117 

87 

104 

61 

103 

58 

99 

47 

91 

27 

20 



126 

96 

119 

90 

105 

63 

105 

63 

101 

53 

93 

32 

21 



128 

97 

121 

92 

107 

68 

106 

66 

103 

58 

95 

37 

22 



129 

97 

123 

94 

109 

73 

108 

70 

104 

61 

98 

45 

23 



130 

98 

128 

97 

110 

75 

110 

75 

105 

63 

100 

50 

24 



131 

98 

131 

98 

112 

79 

112 

79 

108 

70 

101 

53 

25 



135 

99 

133 

99 

118 

88 

114 

83 

109 

73 

102 

55 

26 



141 

99.3 

136 

99 

123 

94 

115 

84 

111 

77 

103 

58 

27 



>141 

>99.3 

>136 

>99 

.126 

96 

118 

88 

113 

81 

106 

66 

28 







131 

98 

119 

90 

114 

83 

109 

73 

29 







135 

99 

121 

92 

116 

86 

112 

79 

30 







141 

99.3 

125 

95 

119 

90 

114 

83 

31 







>141 >99.3 

128 

97 

122 

93 

116 

86 

32 








131 

98 

125 

95 

118 

88 

33 









133 

99 

126 

96 

120 

91 

34 









135 

99 

128 

97 

122 

93 

35 









136 

99 

130 

98 

125 

95 

36 









>136 >99 

133 

99 

126 

96 

37 









vr 

135 

99 

128 

97 

38 









> 1 

136 

99 

130 

98 

39 











>136 : 

>99 

131 

98 

40 











Cj Y 


135 

99 

41 











1 o 


136 

99 

42 













>136 

>99 

43 













V* 


44 















45 















46 















47 















48 















49 















50 















51 















52 















53 















. 54 















55 
















58 




59 



